JP2006340066A

JP2006340066A - Moving image encoder, moving image encoding method and recording and reproducing method

Info

Publication number: JP2006340066A
Application number: JP2005162627A
Authority: JP
Inventors: Masaaki Shimada; 昌明島田; Isao Otsuka; 功大塚; Kazuhiko Nakane; 和彦中根
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2005-06-02
Filing date: 2005-06-02
Publication date: 2006-12-14

Abstract

<P>PROBLEM TO BE SOLVED: To allocate a large code amount to an important scene of a high viewing value by taking into consideration the degree of importance of semantic contents of moving images in re-encode dubbing by a moving image encoding system. <P>SOLUTION: When encoding video signals (110) and sound signals (111) supplied from the outside and recording them in a first recording medium (102), the video signals and the sound signals to be recorded are divided into respective prescribed sections, the feature of the sound of the sound signals in each section is extracted (143), and an encoding rate is decided on the basis of the feature of the sound indicated by the sound signals in each section (120, 121, 122, 123). When performing dubbing from the first recording medium (102) to a second recording medium (103), encoding is performed by the previously decided encoding rate (112, 113). <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は例えば、ハードディスクドライブ（ＨＤＤ）内蔵ＤＶＤレコーダなどの映像記録再生装置でマルチメディアデータファイルの記録、およびダビングを行う場合に適用可能な動画像符号化装置、動画像符号化方法、及び記録再生方法に関する。 The present invention is, for example, a moving picture coding apparatus, a moving picture coding method, and a recording that can be applied to a case where a multimedia data file is recorded and dubbed by a video recording / playback apparatus such as a hard disk drive (HDD) built-in DVD recorder. It relates to a playback method.

ここで、マルチメディアデータファイルとは、例えばＭＰＥＧ−２方式などで符号化圧縮された映像情報と、ＡＣ−３方式などで符号化圧縮された音声情報を多重化した単一ファイルを示す。またダビングとは、映像記録再生装置に備えている第一の記録媒体に記録されたマルチメディアデータファイルを、第二の記録媒体へコピー（または移動）する処理と定義する。 Here, the multimedia data file refers to a single file obtained by multiplexing video information encoded and compressed by, for example, the MPEG-2 system and audio information encoded and compressed by the AC-3 system. Further, dubbing is defined as a process of copying (or moving) a multimedia data file recorded on a first recording medium provided in the video recording / reproducing apparatus to a second recording medium.

一般的なダビング手法としては、大別すると「高速ダビング」と「リエンコードダビング」という２つの手法が広く知られている。「高速ダビング」は、ファイルコピーを基本原則としたダビング手法であり、高速にダビング処理を実施できるが、符号化レートの再配分や符号化パラメータの再設定など、エンコード条件を変更することができない。一方、「リエンコードダビング」は、第一の記録媒体に記録されたマルチメディアデータファイルに含まれる符号化圧縮された映像音声情報を一旦復号化した後に、所望の符号化レートや符号化パラメータに基づいて再度符号化圧縮を行って第二の記録媒体に記録するものである。よって多くの場合ではダビング処理に通常再生と同じ時間が必要となるが、符号化レートの再配分および符号化パラメータの再設定などのエンコード条件を変更することができる。すなわちリエンコードダビングでは、映像シーンに合わせて符号化レートを再配分することができる。 As general dubbing methods, two methods of “high-speed dubbing” and “re-encode dubbing” are widely known. “High-speed dubbing” is a dubbing method based on file copying as a basic principle. Dubbing can be performed at high speed, but the encoding conditions cannot be changed, such as redistribution of encoding rates and resetting of encoding parameters. . On the other hand, “re-encode dubbing” is a method in which after encoding / compressed video / audio information included in a multimedia data file recorded on the first recording medium is once decoded, a desired encoding rate or encoding parameter is set. On the basis of this, the encoding / compression is performed again and the data is recorded on the second recording medium. Therefore, in many cases, the dubbing process requires the same time as normal playback, but encoding conditions such as redistribution of encoding rates and resetting of encoding parameters can be changed. That is, in the re-encoding dubbing, the encoding rate can be redistributed according to the video scene.

従来からのリエンコードダビングでは、マルチメディアデータファイルを第一の記録媒体に通常記録する際に、ある一定区間毎に入力映像の符号必要量（符号化困難度とも言う）を解析しておき、ダビング時には解析した符号必要量に基づいて、符号量を再配分する動画像符号化装置（２パスエンコード方式とも呼ばれる）がある（例えば特許文献１参照）。同様に、符号必要量に基づいて、符号化パラメータを再設定する動画像符号化装置もある（例えば特許文献２参照）。 In the conventional re-encoding dubbing, when a multimedia data file is normally recorded on the first recording medium, the required amount of code of the input video (also referred to as encoding difficulty) is analyzed every certain interval, There is a moving image encoding device (also called a two-pass encoding method) that redistributes code amounts based on the required code amount analyzed during dubbing (see, for example, Patent Document 1). Similarly, there is also a moving image encoding apparatus that resets encoding parameters based on the required code amount (see, for example, Patent Document 2).

特開２００２−２３２８８２号公報（第５−６頁、第１図）Japanese Patent Laid-Open No. 2002-232882 (page 5-6, FIG. 1) 特開２００１−２４５３０３号公報（第３−４頁、第２図）JP 2001-245303 A (page 3-4, FIG. 2)

上述の特許文献に開示されている動画像符号化方式によるリエンコードダビング（或いは２パスエンコード方式）では、膨大な情報量を持つ映像から特徴情報を抽出するため、特徴抽出ハードウェアには高い情報処理能力が必要であり、回路規模および特徴抽出アルゴリズムが複雑になるという問題があった。また視聴者にとっての内容的な重要さの度合いには関係なく、画像の符号化困難度だけから符号量の再配分および符号化パラメータの再設定が行われていた。そのためスポーツの得点シーンや、映画のクライマックスのような視聴者にとって内容的に重要なシーンであっても、符号量が多く割り当てられず、当該箇所でブロックノイズが発生したり、画像が歪んだりなどの画像劣化を引き起こすといった問題点があった。なお、このように視聴者にとって内容的な重要さの度合いを「視聴価値（重要度レベルとも言う）」と呼ぶことにする。 In the re-encoding dubbing (or two-pass encoding method) based on the moving image encoding method disclosed in the above-mentioned patent document, feature information is extracted from a video having an enormous amount of information. There is a problem that processing capability is required, and the circuit scale and the feature extraction algorithm become complicated. Regardless of the degree of content importance for the viewer, the code amount is redistributed and the encoding parameters are reset only from the degree of difficulty in encoding the image. Therefore, even if the scene is important for the viewer, such as a sports scoring scene or a movie climax, a large amount of code is not allocated, and block noise occurs at that location, the image is distorted, etc. There was a problem of causing image degradation. The degree of content importance for the viewer is called “viewing value (also referred to as importance level)”.

そこで、本発明では、音声情報から符号化する動画像の意味的内容の重要さの度合いを算出し、再符号化時に視聴価値に応じて適切な符号配分を行えるようにし、視聴価値の高い重要なシーンには符号量を多く割り振ることが可能な動画像符号化装置を提供することを目的とする。 Therefore, in the present invention, the degree of importance of the semantic content of the moving image to be encoded is calculated from the audio information, so that appropriate code distribution can be performed according to the viewing value at the time of re-encoding, and the high viewing value is important. An object of the present invention is to provide a moving picture coding apparatus capable of allocating a large amount of code to such a scene.

本発明は、前記映像信号及び音声信号を所定の区間毎に分割し、各区間の前記音声信号の音声の特徴を抽出する音声特徴抽出手段と、各区間の前記音声の特徴に基づいて符号化レートを決定する符号化レート決定手段と、決定された符号化レートで前記映像信号及び前記音声信号を符号化する手段とを備えた動画像符号化装置を提供する。 The present invention divides the video signal and the audio signal into predetermined intervals and extracts audio feature extraction means for extracting audio features of the audio signal in each interval, and encoding based on the audio features in each interval. There is provided a moving picture coding apparatus comprising coding rate determining means for determining a rate, and means for encoding the video signal and the audio signal at the determined coding rate.

本発明によれば、情報量が小さい音声物理量（例えば、音声認識技術による話者特定、周波数解析、音量、および音声の属性情報など）だけから、再符号化時の符号化レートを決定する指標を生成できるため、回路規模が小さくても特徴情報を反映した再符号化を行うことができる。 According to the present invention, an index for determining an encoding rate at the time of re-encoding from only a speech physical quantity with a small amount of information (for example, speaker identification by speech recognition technology, frequency analysis, volume, voice attribute information, etc.) Therefore, even if the circuit scale is small, re-encoding that reflects the feature information can be performed.

実施の形態１．
図１はこの発明に係る実施の形態１のシステム構成ブロック図を示すものである。同図において、
システム制御部１０１は、動画像符号化装置１００全体を統合制御している。
実施の形態１では、
ハードディスクドライブ１０２及び光ディスク１０３は、ともに記録手段であるが、本実施の形態では、ハードディスクドライブ１０２は、ダビング元の記録媒体（第１次記録媒体）として、マルチメディアデータファイルおよびメタデータファイルの記録のために用いられており、光ディスク１０３は、ダビング先の記録媒体（第２次記録媒体）として、再符号化されたマルチメディアデータファイルの記録のために用いられる。
ディスク１０３へのファイルの記録再生は、記録再生ドライブ１０４を通じて行われる。 Embodiment 1 FIG.
FIG. 1 is a system configuration block diagram of Embodiment 1 according to the present invention. In the figure,
The system control unit 101 performs integrated control of the entire moving image encoding apparatus 100.
In the first embodiment,
Although the hard disk drive 102 and the optical disk 103 are both recording means, in this embodiment, the hard disk drive 102 records multimedia data files and metadata files as dubbing source recording media (primary recording media). The optical disk 103 is used for recording a re-encoded multimedia data file as a dubbing destination recording medium (secondary recording medium).
Recording / reproducing of the file on the disk 103 is performed through the recording / reproducing drive 104.

動画像符号化装置１００は、バッファメモリ１１５、デマルチプレクサ１３０、ビデオデコーダ１３１、オーディオデコーダ１３３、ビデオエンコーダ１１２、オーディオエンコーダ１１３、マルチプレクサ１１４、映像特徴抽出部１４２、音声特徴抽出部１４３、メタデータ生成部１２０、メタデータ分析部１２１、記録レートマップ保持部１２２、及び記録レート変更手段１２３を備えている。 The moving image encoding apparatus 100 includes a buffer memory 115, a demultiplexer 130, a video decoder 131, an audio decoder 133, a video encoder 112, an audio encoder 113, a multiplexer 114, a video feature extraction unit 142, an audio feature extraction unit 143, and metadata generation. Unit 120, metadata analysis unit 121, recording rate map holding unit 122, and recording rate changing unit 123.

バッファメモリ１１５は、ハードディスクドライブ１０２および光ディスク１０３から読み出したデータや書き込み対象データを一時的に保持するために使用される。
バッファメモリ１１５で保持されるデータには、ＭＰＥＧ−２などで符号化された映像ストリーム、ＡＣ−３などで符号化された音声ストリームを含むマルチメディアデータのほかにメタデータが含まれる。 The buffer memory 115 is used to temporarily hold data read from the hard disk drive 102 and the optical disk 103 and data to be written.
The data held in the buffer memory 115 includes metadata in addition to multimedia data including a video stream encoded by MPEG-2 or the like and an audio stream encoded by AC-3 or the like.

デマルチプレクサ１３０は、バッファメモリ１１５に取りこまれたマルチメディアデータファイルを逐次取り込んだ後に、符号化圧縮された映像ストリームまたは音声ストリームへ分離して出力する。 The demultiplexer 130 sequentially captures the multimedia data file captured in the buffer memory 115, and then separates it into an encoded and compressed video stream or audio stream for output.

ビデオデコーダ１３１は、デマルチプレクサ１３０から出力された、ＭＰＥＧ−２などで符号化された映像ストリームを復号化処理して、出力映像信号１３２を出力する。
オーディオデコーダ１３３は、デマルチプレクサ１３０から出力された、ＡＣ−３などで符号化された音声ストリームを復号化処理して、出力音声信号１３４を出力する。
モニタ１４０は、ビデオデコーダ１３１から出力される出力映像信号１３２及びオーディオデコーダ１３３から出力された出力音声信号１３４を受けて、映像の表示及び音声の出力を行う。 The video decoder 131 decodes the video stream encoded by MPEG-2 or the like output from the demultiplexer 130 and outputs an output video signal 132.
The audio decoder 133 decodes the audio stream encoded by AC-3 or the like output from the demultiplexer 130 and outputs an output audio signal 134.
The monitor 140 receives the output video signal 132 output from the video decoder 131 and the output audio signal 134 output from the audio decoder 133, and performs video display and audio output.

ビデオエンコーダ１１２は、動画像符号化装置１００への入力映像信号１１０、或いはハードディスクドライブ１０２から再生され、ビデオデコーダ１３１で復号化された映像信号を、ＭＰＥＧ−２などで符号化して映像ストリームを生成する。即ち、ビデオエンコーダ１１２は、後述のように、ハードディスクドライブ１０２への番組の録画の際に、入力映像信号１１０をＭＰＥＧ−２などで符号化圧縮し映像ストリームを生成する。一方、ダビングの際には、ハードディスクドライブ１０２から再生され、ビデオデコーダ１３１で復号化された映像信号を受けて再符号化する。
映像特徴抽出部１４２は、録画の際、或いは録画の後であって、ダビングの前に、ビデオエンコーダ１１２で符号化される映像信号について、フレーム間の動きベクトル量やカラーヒストグラムの変化量や画像認識手法を用いた人物や顔などの検出など、特徴量の抽出を行う。 The video encoder 112 encodes the video signal 110 input to the video encoding device 100 or the video signal reproduced from the hard disk drive 102 and decoded by the video decoder 131 using MPEG-2 to generate a video stream. To do. That is, as will be described later, the video encoder 112 encodes and compresses the input video signal 110 with MPEG-2 or the like when a program is recorded on the hard disk drive 102 to generate a video stream. On the other hand, when dubbing, the video signal reproduced from the hard disk drive 102 and decoded by the video decoder 131 is received and re-encoded.
The video feature extraction unit 142 performs a motion vector amount between frames, a change amount of a color histogram, an image of a video signal encoded by the video encoder 112 during recording or after recording and before dubbing. Extraction of features such as detection of people and faces using recognition methods.

オーディオエンコーダ１１３は、動画像符号化装置１００への入力音声信号１１１、或いはハードディスクドライブ１０２から再生され、オーディオデコーダ１３３で復号化された音声信号を、ＡＣ−３などで符号化して音声ストリームを生成する。即ち、オーディオエンコーダ１１３は、後述のように、ハードディスクドライブ１０２への番組の録画の際に、入力音声信号１１１をＡＣ−３などで符号化して音声ストリームを生成する。一方、ダビングの際には、ハードディスクドライブ１０２から再生され、オーディオデコーダ１３３で復号化された音声信号を受けて再符号化する。
音声特徴抽出部１４３は、ハードディスクドライブ１０２への録画の際、或いは録画の後であって、ダビングの前に、オーディオエンコーダ１１３で符号化される音声信号について、デジタルサンプリング後の係数値や周波数情報の変化、あるいは音声レベルの変化または音声認識手法を用いた話者の変化や拍手シーンの検出など、特徴量の抽出を行う。
映像特徴抽出部１４２は、ビデオエンコーダ１１２の一部として構成することができ、同様に音声特徴抽出部１４３は、オーディオエンコーダ１１３の一部として構成することができる。 The audio encoder 113 encodes the audio signal 111 input to the moving image encoding apparatus 100 or the audio signal reproduced from the hard disk drive 102 and decoded by the audio decoder 133 with an AC-3 or the like to generate an audio stream. To do. That is, as will be described later, the audio encoder 113 encodes the input audio signal 111 with AC-3 or the like when recording a program on the hard disk drive 102 to generate an audio stream. On the other hand, when dubbing, the audio signal reproduced from the hard disk drive 102 and decoded by the audio decoder 133 is received and re-encoded.
The audio feature extraction unit 143 performs coefficient value and frequency information after digital sampling on the audio signal encoded by the audio encoder 113 at the time of recording on the hard disk drive 102 or after recording and before dubbing. Feature amount extraction, such as detection of a change of a speaker, a change of a voice level, a change of a speaker using a voice recognition method, or a scene of applause.
The video feature extraction unit 142 can be configured as part of the video encoder 112, and similarly, the audio feature extraction unit 143 can be configured as part of the audio encoder 113.

映像特徴抽出部１４２及び音声特徴抽出部１４３における特徴量の抽出は、互いに同期した所定の区分毎に行われる。この所定の区分は、前記映像信号または前記音声信号の所定の時間間隔あるいはデータ容量に基づいたセグメントに対応するものである。
この時間間隔は、例えば、ビデオショットに対応する。ビデオショットは、所定時間で区分された連続的なフレームから成るものである。 The feature amount extraction in the video feature extraction unit 142 and the audio feature extraction unit 143 is performed for each predetermined segment synchronized with each other. This predetermined section corresponds to a segment based on a predetermined time interval or data capacity of the video signal or the audio signal.
This time interval corresponds to a video shot, for example. A video shot is composed of continuous frames divided at a predetermined time.

マルチプレクサ１１４は、ビデオエンコーダ１１２における符号化で生成された映像ストリームとオーディオエンコーダ１１３における符号化で生成された音声ストリームをパケット化して再生時間情報と共に多重化して、順次バッファメモリ１１５へ記録する。 The multiplexer 114 packetizes the video stream generated by the encoding in the video encoder 112 and the audio stream generated by the encoding in the audio encoder 113, multiplexes them together with the reproduction time information, and sequentially records them in the buffer memory 115.

メタデータ生成部１２０は、映像特徴抽出部１４２及び音声特徴抽出部１４３から出力される特徴情報に基づいてメタデータを生成する。生成されたメタデータ１１５は、バッファメモリ１１５に書込まれ、さらに、ハードディスクドライブ１０２に書込まれる。
また、必要に応じて、ハードディスクドライブ１０２から読み出されて、バッファメモリ１１５に書込まれ、さらに後述のメタデータ分析部１２１に供給される。
メタデータとしては、上記の区間、例えばビデオショット毎の重要度レベルが記述され、これとともに再生時間情報が記述されている。 The metadata generation unit 120 generates metadata based on the feature information output from the video feature extraction unit 142 and the audio feature extraction unit 143. The generated metadata 115 is written into the buffer memory 115 and further written into the hard disk drive 102.
If necessary, the data is read from the hard disk drive 102, written to the buffer memory 115, and further supplied to the metadata analysis unit 121 described later.
As metadata, the importance level for each section, for example, each video shot, is described, and playback time information is described together with this.

メタデータ分析部１２１は、バッファメモリ１１５に取り込まれたメタデータを逐次取り込んだ後に、メタデータに記述された上記の区間毎の、例えばビデオショット毎の再生時間情報と重要度レベルを取得し、ダビングなどの再符号化に際して再生時間情報に対応した再符号化レート情報を生成する。 The metadata analysis unit 121 acquires the reproduction time information and the importance level for each of the above-described sections described in the metadata, for example, for each video shot, after sequentially capturing the metadata captured in the buffer memory 115, At the time of re-encoding such as dubbing, re-encoding rate information corresponding to the reproduction time information is generated.

記録レートマップ保持部１２２は、メタデータ分析部１２１にて生成された再符号化レート情報を保持する。
記録レート変更手段１２３は、ダビングなどの再符号化に際して、記録レートマップ保持部１２２に記録されている再符号化レート情報をもとに、さらに再生時間情報に応じて、ビデオエンコーダ１１２及びオーディオエンコーダ１１３のそれぞれの符号化レートを定めて出力する。
ビデオエンコーダ１１２およびオーディオエンコーダ１１３は、記録レート変更手段１２３から供給される符号化レートで動作する。 The recording rate map holding unit 122 holds the re-encoding rate information generated by the metadata analysis unit 121.
The recording rate changing means 123, when re-encoding such as dubbing, based on the re-encoding rate information recorded in the recording rate map holding unit 122, and further depending on the playback time information, the video encoder 112 and the audio encoder Each encoding rate of 113 is determined and output.
The video encoder 112 and the audio encoder 113 operate at the encoding rate supplied from the recording rate changing unit 123.

メタデータ生成部１２０、メタデータ分析部１２１、記録レートマップ保持部１２２、および記録レート変更手段１２３は、その一部あるいは全体をソフトウェアで構成することも可能であり、そのソフトウェアはシステム制御部１０１に組み込まれていても構わない。またメタデータの生成や記録レートマップの生成の処理には、適宜メモリ（図示しない）が利用されるものとする。 The metadata generation unit 120, the metadata analysis unit 121, the recording rate map holding unit 122, and the recording rate changing unit 123 can be partially or wholly configured by software, and the software is the system control unit 101. It may be built in. Further, it is assumed that a memory (not shown) is used as appropriate for the process of generating the metadata and the recording rate map.

図２はこの発明の実施の形態１におけるファイル構成図を示すものである。図２は、ダビング元であるハードディスクドライブ１０２内の論理ファイル構造を示したものであり、
２０は論理的に階層構造を成すファイル構造の最上位階層のディレクトリ構造であるルートディレクトリ、２１はルートディレクトリ２０の下位階層に配置されるディレクトリ構造であるマルチメディアディレクトリ、２２はマルチメディアディレクトリ２１と同じくルートディレクトリ２０の下位階層に配置されるディレクトリ構造であるメタデータディレクトリ、２３はハードディスクドライブ１０２に記録されている番組の管理情報（属性情報及び再生時間情報を含む）を記述した情報管理ファイル、２４は番組の映像信号または音声信号を符号化圧縮した映像ストリームまたは音声ストリームの少なくとも一方を含んで再生時間情報と共に多重化したマルチメディアデータファイル、２５は情報管理ファイル２３などのバックアップファイル、２６はマルチメディアデータファイル２４と関連付けされた特徴データを含み且つ当該マルチメディアデータファイル２４とは独立した論理ファイルで成るメタデータファイルである。 FIG. 2 shows a file structure diagram according to the first embodiment of the present invention. FIG. 2 shows a logical file structure in the hard disk drive 102 that is the dubbing source.
Reference numeral 20 denotes a root directory that is a directory structure at the highest level of a file structure that logically forms a hierarchical structure, reference numeral 21 denotes a multimedia directory that is a directory structure arranged in a lower hierarchy of the root directory 20, and reference numeral 22 denotes a multimedia directory 21. Similarly, a metadata directory having a directory structure arranged in a lower hierarchy of the root directory 20, an information management file 23 describing management information (including attribute information and playback time information) of programs recorded in the hard disk drive 102, Reference numeral 24 denotes a multimedia data file including at least one of a video stream or an audio stream obtained by encoding and compressing a video signal or an audio signal of a program and multiplexed with reproduction time information, and 25 denotes a backup of the information management file 23 or the like. Airu, 26 is a metadata file comprising a separate logical file and multimedia data file 24 includes associated feature data with and the multimedia data file 24.

マルチメディアデータファイル２４とメタデータファイル２６は別個のディレクトリ内に配置する例を示したが、同一のディレクトリ内に配置されていたり、メタデータファイル２６がルートディレクトリに直接配置されていたりしても構わない。
またマルチメディアデータファイル２４およびメタデータファイル２６は番組の数に応じて分割したり、ファイル容量の制限によって複数に分割されていたりしても構わない。 Although an example in which the multimedia data file 24 and the metadata file 26 are arranged in separate directories is shown, the multimedia data file 24 and the metadata file 26 may be arranged in the same directory or the metadata file 26 may be arranged directly in the root directory. I do not care.
Further, the multimedia data file 24 and the metadata file 26 may be divided according to the number of programs, or may be divided into a plurality of parts due to file capacity limitations.

さらに、本実施の形態におけるメタデータファイル２６は、そのデータ形式は問わず、テキスト形式であってもバイナリ形式であっても構わない。また第三者による改ざんや情報の流出を阻む為に暗号化処理を施すこととしても良い。 Further, the metadata file 26 in the present embodiment is not limited to the data format, and may be a text format or a binary format. In addition, encryption processing may be performed to prevent tampering by third parties and leakage of information.

管理情報ファイル２３にメタデータまたはメタデータファイル２６が存在しているか否か、もしくはメタデータまたはメタデータファイル２６が記述いる場合に、それが有効な値であるか否かの情報を記述しておく方法もあり、これにより蓄積メディア１も記述されている当該情報を参照することで素早くメタデータまたはメタデータファイル２６の有無もしくは有効であるか否かを判断することが可能となる。 Describe information about whether or not the metadata or metadata file 26 exists in the management information file 23, or if the metadata or metadata file 26 is described, whether or not it is a valid value. In other words, it is possible to quickly determine whether or not the metadata or the metadata file 26 is valid by referring to the information described in the storage medium 1.

図３はこの発明の実施の形態１におけるメタデータ構造図である。図３は第１次記録媒体であるハードディスクドライブ１０２に記録するメタデータファイル２６のデータ構造を示すものである。以下、同図に示す階層構造を参照して説明する。図３（Ａ）に示すようにメタデータ３０は、データ構造の最上位の階層に位置する。 FIG. 3 is a metadata structure diagram according to the first embodiment of the present invention. FIG. 3 shows the data structure of the metadata file 26 recorded in the hard disk drive 102 as the primary recording medium. Hereinafter, description will be made with reference to the hierarchical structure shown in FIG. As shown in FIG. 3A, the metadata 30 is located in the highest hierarchy of the data structure.

次に、図３（Ｂ）に示すように、メタデータ３０は、メタデータ全体の情報を統括的に記述するメタデータ管理情報３１ａと、Ｎ個（Ｎは１以上の整数）のビデオオブジェクト（以降、ＶＯＢと記す）メタデータ情報サーチポインタ３１ｂ−１〜３１ｂ−Ｎと、Ｎ個のＶＯＢメタデータ情報３１ｃ−１〜３１ｃ−Ｎとを含む。 Next, as shown in FIG. 3B, the metadata 30 includes metadata management information 31a that collectively describes information of the entire metadata, and N video objects (N is an integer of 1 or more). (Hereinafter referred to as VOB) metadata information search pointers 31b-1 to 31b-N and N pieces of VOB metadata information 31c-1 to 31c-N.

図２のマルチメディアデータファイル２４に記録されたデータは１つ以上の複数のＶＯＢに分割されている。各ＶＯＢは一つの番組に対応する場合もあり、各ＶＯＢがファイル容量の制限によって分割された単位である場合もある。ＶＯＢメタデータ情報３１ｂ−ｎ（３１ｂ−１乃至３１ｂ−Ｎのうちの任意のもの）は、図３（Ｂ）に示すように、マルチメディアデータファイル２４内に、個々のＶＯＢに対応して用意される。すなわちｎ番目のＶＯＢｎには、対応するｎ番目のＶＯＢｎメタデータ情報３１ｃ−ｎが存在する。なお、ＶＯＢの番号とＶＯＢメタデータ情報３１ｃ−ｎの番号の対応関係を示すテーブルやオフセット量が用意されている場合には、必ずしも双方の番号が一致している必要も無く、また１つのＶＯＢに対するＶＯＢメタデータ情報（３１ｃ−ｎなど）が複数用意されていても良く、１つのＶＯＢメタデータ情報３１ｃ−ｎが複数のＶＯＢに関連乃至対応するものであっても構わない。なお関連するメタデータを持たないＶＯＢには、対応するＶＯＢメタデータ情報（３１ｃ−ｎに相当するもの）が存在しない場合もあり得る。 The data recorded in the multimedia data file 24 of FIG. 2 is divided into one or more VOBs. Each VOB may correspond to one program, and each VOB may be a unit divided by a file capacity limit. The VOB metadata information 31b-n (any one of 31b-1 to 31b-N) is prepared corresponding to each VOB in the multimedia data file 24 as shown in FIG. 3B. Is done. That is, the nth VOBn has corresponding nth VOBn metadata information 31c-n. When a table indicating the correspondence between the number of the VOB and the number of the VOB metadata information 31c-n or an offset amount is prepared, it is not always necessary that the numbers coincide with each other, and one VOB is used. A plurality of VOB metadata information (such as 31c-n) may be prepared, and one VOB metadata information 31c-n may be related to or correspond to a plurality of VOBs. Note that there may be no VOB metadata information (corresponding to 31c-n) in a VOB that does not have associated metadata.

図３（Ｂ）に示されるｎ番目のＶＯＢｎメタデータ情報サーチポインタ３１ｂ−ｎには、ｎ番目のＶＯＢｎメタデータ情報３１ｃ−ｎの開始アドレス情報が記されている。また、ＶＯＢメタデータ情報３１ｃ−１乃至３１ｃ−Ｎの総数Ｎは、メタデータ管理情報３１ａに記されている。 The nth VOBn metadata information search pointer 31b-n shown in FIG. 3B describes start address information of the nth VOBn metadata information 31c-n. The total number N of VOB metadata information 31c-1 to 31c-N is described in the metadata management information 31a.

次に、図３（Ｃ）に示すように、ＶＯＢメタデータ情報３１ｃ−ｎの各々はメタデータ一般情報３２ａとビデオショットマップ情報３２ｂとを含む。 Next, as shown in FIG. 3C, each of the VOB metadata information 31c-n includes metadata general information 32a and video shot map information 32b.

メタデータ一般情報３２ａには、上位階層のＶＯＢメタデータ情報３１ｃ−ｎが対応しているＶＯＢのコンテンツ情報、および対応するビデオショットマップ情報３２ｂの開始アドレス情報などが記述されている。ここでメタデータ一般情報３２ａに記述されるＶＯＢのコンテンツ情報とは、番組名や制作者名、出演者名、内容説明、録画した番組の放送日時やチャンネルなどである。 In the metadata general information 32a, VOB content information corresponding to the upper layer VOB metadata information 31c-n, start address information of the corresponding video shot map information 32b, and the like are described. Here, the VOB content information described in the metadata general information 32a includes a program name, a producer name, a performer name, a description, a broadcast date and time of a recorded program, a channel, and the like.

ビデオショットマップ情報３２ｂの詳細は図３（Ｄ）に示されるように、ビデオショットマップ一般情報３３ａ、とＭ個（Ｍは１以上の整数）ビデオショットエントリ３３ｂ−１乃至３３ｂ−Ｍを含む。図２に示すマルチメディアデータファイル２４に記録した映像ストリームまたは音声ストリームを再生時間軸で複数に断片化したものはそれぞれビデオショットとなる。ビデオショットエントリ３３ｂ−１乃至３３ｂ−Ｍは、参照するＶＯＢにおけるビデオショットの総数Ｍに対応してＭ個用意される。すなわちｍ番目のビデオショットには、対応するｍ番目のビデオショットｍエントリ３３ｂ−ｍが存在する。
なおビデオショットの番号とビデオショットエントリ３３ｂ−ｍの番号の対応関係を示すテーブルやオフセット量が用意されている場合には、必ずしも双方の番号が一致している必要も無く、また１つのビデオショットに対するビデオショットエントリ（３３ｂ−ｍなど)が複数用意されていてもよく、１つのビデオショットエントリ３３ｂ−ｍが複数のビデオショットを包括する（ビデオショットに対応する）ものであても良い。ビデオショットエントリ３３ｂ−１乃至３３ｂ−Ｍの総数Ｍは、ビデオショットマップ一般情報３３ａに記述されている。 As shown in FIG. 3D, details of the video shot map information 32b include video shot map general information 33a and M (M is an integer of 1 or more) video shot entries 33b-1 to 33b-M. Each video stream or audio stream recorded in the multimedia data file 24 shown in FIG. 2 is divided into a plurality of video shots along the playback time axis. M video shot entries 33b-1 to 33b-M are prepared corresponding to the total number M of video shots in the VOB to be referred to. That is, the mth video shot has a corresponding mth video shot m entry 33b-m.
When a table showing the correspondence between the number of video shots and the number of video shot entries 33b-m and an offset amount are prepared, it is not always necessary for both numbers to match, and one video shot A plurality of video shot entries (such as 33b-m) may be prepared, and one video shot entry 33b-m may include a plurality of video shots (corresponding to video shots). The total number M of the video shot entries 33b-1 to 33b-M is described in the video shot map general information 33a.

ビデオショットエントリ３３ｂ−ｍは図３（Ｅ）に示すように、ビデオショット開始時間情報３４ａと、ビデオショット終了時間情報３４ｂと、ビデオショット重要度レベル３４ｃとを含む。
ビデオショット開始時間情報３４ａは、マルチメディアデータファイル２４に記録した映像ストリームまたは音声ストリームを再生時間軸で複数に断片化したビデオショットの再生開始時間（プレゼンテーションタイム）あるいは開始フレーム位置情報である。
ビデオショット終了時間情報３４ｂは当該ビデオショットの再生終了時間（プレゼンテーションタイム）あるいは終了フレーム位置情報である。
ビデオショット重要度レベル３４ｃは当該ビデオショットに対して付与された数値であってコンテンツの重要度合いを示す数値である。
ビデオショット終了時間情報３４ｂは、ビデオショット再生時間情報３４ａに対するビデオショット個々の時間間隔が別途取得できる場合には省略しても良い。 As shown in FIG. 3E, the video shot entry 33b-m includes video shot start time information 34a, video shot end time information 34b, and a video shot importance level 34c.
The video shot start time information 34a is playback start time (presentation time) or start frame position information of a video shot obtained by fragmenting a video stream or audio stream recorded in the multimedia data file 24 into a plurality of pieces on the playback time axis.
The video shot end time information 34b is playback end time (presentation time) or end frame position information of the video shot.
The video shot importance level 34c is a numerical value assigned to the video shot and indicating the importance level of the content.
The video shot end time information 34b may be omitted when the time interval of each video shot with respect to the video shot reproduction time information 34a can be acquired separately.

なお、ビデオショット重要度レベル３４ｃに付した重要度とは便宜上の名称であり、必ず（ａ）ビデオショットの内容が主観評価に基づくコンテンツの内容における重要度（例えばハイライトシーンでは高くなる）であってもよく、
（ｂ）ビデオショットに対応する音声の、歓声の持続時間の長さに対応した値であってもよく、
（ｃ）ビデオショットの画面内の動きの激しさの度合いに対応した値であってもよく、
（ｄ）主観評価に拠らない物理的な計測値や指標に基づく数値であってもよい。
以上のうち、（ａ）及び（ｂ）は主観評価に基づく重要度と言い、（ｃ）及び（ｄ）は物理変化量に対応する重要度と言う。ここで「物理的変化量」とは、画像符号化における動きベクトルや、カラーヒストグラム、音量など映像データや音声データから直接取得可能な物理的な指標を指している。一方主観評価に基づく重要度は、映像や音声の面白さ（例えば観客の熱狂具合）のように、映像データや音声データから直接取得された物理的指標を意味付けをした特徴量を指す。 The importance attached to the video shot importance level 34c is a name for convenience, and (a) the content of the video shot is the importance of the content based on the subjective evaluation (for example, it becomes higher in the highlight scene). May be,
(B) It may be a value corresponding to the duration of the cheering of the audio corresponding to the video shot,
(C) It may be a value corresponding to the intensity of the motion of the video shot in the screen,
(D) It may be a numerical value based on a physical measurement value or index that does not depend on subjective evaluation.
Among the above, (a) and (b) are referred to as the importance based on the subjective evaluation, and (c) and (d) are referred to as the importance corresponding to the physical change amount. Here, the “physical change amount” refers to a physical index that can be directly obtained from video data and audio data such as a motion vector, color histogram, and sound volume in image coding. On the other hand, the degree of importance based on subjective evaluation refers to a feature amount that is a meaning of a physical index directly obtained from video data or audio data, such as the fun of video and audio (for example, the madness of audience).

図４はこの発明の実施の形態１におけるデータ同期概念図である。ここではＤＶＤ−ＲやＤＶＤ−ＲＷメディアに適用されるビデオレコーディング（ＶｉｄｅｏＲｅｃｏｒｄｉｎｇ）フォーマットの基本的な概念を例にして説明しているが、本発明はビデオレコーディングフォーマットに限ったものではなく、マルチメディアデータファイルとメタデータファイルとが再生時間情報に基づいて同期し得る蓄積メディアフォーマットに幅広く適用できるものである。 FIG. 4 is a conceptual diagram of data synchronization in the first embodiment of the present invention. Here, the basic concept of the video recording format applied to DVD-R and DVD-RW media is described as an example. However, the present invention is not limited to the video recording format. The present invention can be widely applied to a storage media format in which a media data file and a metadata file can be synchronized based on reproduction time information.

図４において、４０は管理情報ファイル２３に記述されていてマルチメディアデータファイル２４のマルチメディアデータの再生順序を記述するプログラムチェーン情報、４１はプログラムチェーン情報４０によって定義された再生単位であってＮ個（Ｎは１以上の整数）のプログラム（そのうち２個のみが図示されている）、４２ａおよび４２ｂはプログラム４１によって定義された再生単位であって１つ以上のセル（「セル１」、「セル２」）、４３ａおよび４３ｂは管理情報ファイル２３に記述されていてセル４２で指定された再生時間情報（プレゼンテーションタイム）に相当する実際の映像データまたは音声データの参照先を記述するためのビデオオブジェクト（ＶＯＢ）情報（「ＶＯＢ１情報」、「ＶＯＢ２情報）」、４４ａおよび４４ｂはＶＯＢ情報４３によって定義された再生時間情報（プレゼンテーションタイム）をオフセットして実際の映像データまたは音声データのアドレス情報に変換するためのタイムマップテーブル、４５ａおよび４５ｂはマルチメディアデータファイル２４に記述されている実際の映像データまたは音声データを再生時間情報と共にパケット構造にて多重化して映像音声再生システムがアクセスする為の最小単位に細分化されているビデオオブジェクトユニット（以下、ＶＯＢＵと記す）であり、図示の例では、ＶＯＢ１のためのタイムマップテーブル４４ａにＶＯＢＵ１〜ＶＯＢＵＰが含まれ、ＶＯＢ２のためのタイムテーブル４４ｂにＶＢＵ１〜ＶＯＢＵＱが含まれる。
符号３３ｂ−１〜３３ｂ−Ｒ、３３ｂ−１〜３３ｂ−Ｓで示されるビデオショット１エントリ〜ビデオショットＲエントリ、ビデオショットエントリ１〜ビデオショットＳエントリは、それぞれ図３のＶＯＢ１用メタデータ３１ｃ−１、ＶＯＢ２用メタデータ３１ｃ−１の一部をなすものであり、図３のビデオショット１エントリ３３ｂ−１〜ビデオショットＭエントリ３３ｂ−Ｍに対応する。 In FIG. 4, 40 is program chain information described in the management information file 23 and describes the playback order of multimedia data in the multimedia data file 24, 41 is a playback unit defined by the program chain information 40, and N (N is an integer greater than or equal to 1) programs (only two of them are shown), 42a and 42b are playback units defined by the program 41, and one or more cells ("cell 1", " Cells 2 "), 43a and 43b are described in the management information file 23 and are videos for describing the reference destination of the actual video data or audio data corresponding to the reproduction time information (presentation time) designated in the cell 42. Object (VOB) information ("VOB1 information", "VOB2 information)", 44a 44b is a time map table for offsetting the reproduction time information (presentation time) defined by the VOB information 43 and converting it into address information of actual video data or audio data. 45a and 45b are stored in the multimedia data file 24. Video object units (hereinafter referred to as VOBU) that are subdivided into the minimum units for the video / audio reproduction system to access by multiplexing the actual video data or audio data described in the packet structure together with the reproduction time information. In the illustrated example, the time map table 44a for VOB1 includes VOBU1 to VOBUP, and the time table 44b for VOB2 includes VBU1 to VOBUQ.
Video shot 1 entry to video shot R entry and video shot entry 1 to video shot S entry indicated by reference numerals 33b-1 to 33b-R and 33b-1 to 33b-S are VOB1 metadata 31c- of FIG. 1 and part of the VOB2 metadata 31c-1, and corresponds to the video shot 1 entry 33b-1 to the video shot M entry 33b-M in FIG.

図５はこの発明の実施の形態１におけるメタデータ分析部１２１で生成する重要度マップと符号化レートマップの一例を示す。同図（ａ）は、ビデオオブジェクトに関して、再生時間に対する重要度レベルの推移を示す重要度マップの一例を示している。また同図（ｂ）は、ビデオオブジェクトに関して、再生時間に対するダビング（再符号化）時に設定する符号化レート参照値の推移を示す符号化レートマップの一例を示す。 FIG. 5 shows an example of the importance map and the encoding rate map generated by the metadata analysis unit 121 according to Embodiment 1 of the present invention. FIG. 5A shows an example of an importance map showing the transition of the importance level with respect to the playback time for a video object. FIG. 5B shows an example of a coding rate map showing a transition of a coding rate reference value set at the time of dubbing (re-encoding) with respect to a playback time for a video object.

重要度マップ（図５（ａ））は、横軸５１に再生時間を示し、縦軸５０は重要度レベルを示す。同図中に示すように、重要度レベルグラフ５２は、所定の範囲（例えば［０，１］または［０，１００］）において連続的に変化する。重要度マップ上限値５３は当該所定の範囲の上限値を示し、重要度マップ下限値５４は当該所定の範囲の下限値を示す。すなわち重要度レベルグラフ５２は、重要度マップ上限値５３から重要度マップ下限値５４の範囲内を変化する。 In the importance map (FIG. 5A), the horizontal axis 51 indicates the reproduction time, and the vertical axis 50 indicates the importance level. As shown in the figure, the importance level graph 52 continuously changes in a predetermined range (for example, [0, 1] or [0, 100]). The importance degree map upper limit value 53 indicates the upper limit value of the predetermined range, and the importance degree map lower limit value 54 indicates the lower limit value of the predetermined range. That is, the importance level graph 52 changes within the range from the importance map upper limit 53 to the importance map lower limit 54.

なお、横軸５１に示した時間の単位は、図３のビデオショット開始時間情報３４ａおよびビデオショット終了時間情報３４ｂに示す値に基づいたものあり、重要度レベルはビデオショット重要度レベル３４ｃの値に基づくものとする。言換えると、図５のグラフは、ビデオショット開始時間情報３４ａで表されるビデオショット開始時間からビデオショット終了時間情報３４ｂで表されるビデオショット終了時間までの一つのビデオショット（シーン)に対して１つの重要度レベル（ビデオショット重要度レベル３４ｃで表されたもの）をプロットしたものである。 The unit of time shown on the horizontal axis 51 is based on the values shown in the video shot start time information 34a and the video shot end time information 34b in FIG. 3, and the importance level is the value of the video shot importance level 34c. It shall be based on In other words, the graph of FIG. 5 shows one video shot (scene) from the video shot start time represented by the video shot start time information 34a to the video shot end time represented by the video shot end time information 34b. And one importance level (represented by the video shot importance level 34c) is plotted.

重要度レベルが高い値で設定されている箇所は、ある基準に基づいたコンテンツ中のハイライトシーン（ビデオショット）を意味しており、ユーザーにとって視聴価値が高い箇所と言える。 A place where the importance level is set to a high value means a highlight scene (video shot) in the content based on a certain standard, and it can be said that the viewing value is high for the user.

符号化レートマップ（図５（ｂ））は、横軸５６に再生時間を示し、縦軸５５はダビング時における再符号化レートの設定参照値を示している。同図中に示すように、符号化レート設定グラフ５７は、符号化レート変換上限値５８から符号化レート変換下限値５９までの間で、連続的に変化する。符号化レート変換上限値５８及び符号化レート変換下限値５９を表すデータは、予めシステム制御部１０１からメタデータ分析部１２１に供給され、メタデータ分析部１２１に記憶されている。 In the encoding rate map (FIG. 5B), the horizontal axis 56 indicates the reproduction time, and the vertical axis 55 indicates the re-encoding rate setting reference value during dubbing. As shown in the figure, the encoding rate setting graph 57 continuously changes between the encoding rate conversion upper limit value 58 and the encoding rate conversion lower limit value 59. Data representing the encoding rate conversion upper limit value 58 and the encoding rate conversion lower limit value 59 is supplied from the system control unit 101 to the metadata analysis unit 121 in advance and stored in the metadata analysis unit 121.

上記のように、重要度レベルから映像シーンに適した符号化レートを求める際に、動画像符号化装置が設定した上限・下限値内で、シーン毎にビットレートを効率よく配分することで、指定された符号化レートの範囲内で記録されたマルチメディアデータファイルを生成することができる。また上限値・下限値を操作することで、マルチメディアデータファイルの平均符号化レートおよび重要度レベルの影響度を制御することができる。 As described above, when determining the encoding rate suitable for the video scene from the importance level, by efficiently allocating the bit rate for each scene within the upper limit / lower limit value set by the video encoding device, A multimedia data file recorded within a specified encoding rate can be generated. Further, by manipulating the upper limit value and the lower limit value, it is possible to control the influence of the average coding rate and importance level of the multimedia data file.

再符号化レートが高い値で設定されている箇所では、リエンコードダビングの際に高い符号化レートが割り当てられることで、ブロックノイズや画像歪みが発生しにくい高精細な映像が記録することができる。 At locations where the re-encoding rate is set to a high value, a high-definition video that is less prone to block noise and image distortion can be recorded by assigning a high encoding rate during re-encoding dubbing. .

以下に上記の動画像符号化装置１００の動作を説明する。
まず、図１、図２、図３、図４を用いて、ビデオレコーディングフォーマット等における一般的な録画過程（通常の録画と同様の処理）の概要を説明する。本実施の形態１では、ダビング元の記録媒体（第１次記録媒体）としてハードディスクドライブ１０２に録画する場合について記載しているが、もちろんダビング元の記録媒体（第１次記録媒体）が光ディスク１０３であっても構わない。 Hereinafter, the operation of the moving image encoding apparatus 100 will be described.
First, an outline of a general recording process (a process similar to normal recording) in a video recording format or the like will be described with reference to FIG. 1, FIG. 2, FIG. 3, and FIG. In the first embodiment, the case of recording on the hard disk drive 102 as a dubbing source recording medium (primary recording medium) is described. Of course, the dubbing source recording medium (primary recording medium) is the optical disc 103. It does not matter.

外部から入力された入力映像信号１１０は、ビデオエンコーダ１１２によりＭＰＥＧ−２などの符号化圧縮方式に符号化され、映像ストリームが生成される。入力音声信号１１１も同様に、オーディオエンコーダ１１３により、ＡＣ−３などの符号化圧縮手法により符号化され、音声ストリームが生成される。これらの映像ストリームと音声ストリームは、マルチプレクサ１１４にて多重化され、マルチメディアデータファイル２４が生成される。その後、マルチメディアデータファイル２４は、順次バッファメモリ１１５に書き込まれ、システム制御部１０１からの指示のもと、ハードディスクドライブ１０２に記録される。 An input video signal 110 input from the outside is encoded by a video encoder 112 into an encoding / compression method such as MPEG-2 to generate a video stream. Similarly, the input audio signal 111 is encoded by the audio encoder 113 by an encoding compression method such as AC-3 to generate an audio stream. These video stream and audio stream are multiplexed by the multiplexer 114, and a multimedia data file 24 is generated. Thereafter, the multimedia data file 24 is sequentially written in the buffer memory 115 and recorded in the hard disk drive 102 under the instruction from the system control unit 101.

マルチメディアデータファイル２４は、図２に示すディレクトリ構造で記録される。マルチメディアデータファイル２４を記録する際には、マルチメディアデータファイル２４の属性情報および再生時間情報を情報管理ファイル２３に記録する。その後、バックアップファイル２５に、情報管理ファイル２３と同一の情報が保持できるようにファイルの生成、もしくはデータの更新が行われる。 The multimedia data file 24 is recorded in the directory structure shown in FIG. When recording the multimedia data file 24, attribute information and playback time information of the multimedia data file 24 are recorded in the information management file 23. Thereafter, a file is generated or data is updated so that the same information as the information management file 23 can be held in the backup file 25.

次に、図１から図５を用いて、実施の形態１の動画像符号化装置１００における録画過程についてより詳細に説明する。ここでもハードディスクドライブ１０２への録画について記載するが、もちろん光ディスク１０３へ録画しても構わない。 Next, the recording process in the moving picture coding apparatus 100 according to the first embodiment will be described in more detail with reference to FIGS. Here, recording to the hard disk drive 102 will be described, but of course, recording to the optical disk 103 may be performed.

本実施の形態によるダビングを行う場合、まず一次記録媒体であるハードディスクドライブ１０２へ番組の録画を行う。録画の際には、
ビデオエンコーダ１１２にて入力映像信号１１０をＭＰＥＧ−２などに符号化圧縮する際に、映像特徴抽出部１４２において、フレーム間の動きベクトル量やカラーヒストグラムの変化量や画像認識手法を用いた人物や顔などの検出など、特徴量の抽出が行われる。
同じくオーディオエンコーダ１１３にて入力音声信号１１１をＡＣ−３などに符号化圧縮する際に、音声特徴抽出部１４３において、デジタルサンプリング後の係数値や周波数情報の変化、あるいは音声レベルの変化または音声認識手法を用いた話者の変化や拍手シーンの検出など、特徴量の抽出が行われる。 When dubbing according to the present embodiment is performed, a program is first recorded on the hard disk drive 102 which is a primary recording medium. When recording,
When the video encoder 112 encodes and compresses the input video signal 110 to MPEG-2 or the like, the video feature extraction unit 142 uses a motion vector amount between frames, a change amount of a color histogram, a person using an image recognition technique, Extraction of features such as detection of a face or the like is performed.
Similarly, when the audio encoder 113 encodes and compresses the input audio signal 111 to AC-3 or the like, the audio feature extraction unit 143 changes the coefficient value or frequency information after digital sampling, changes in audio level, or audio recognition. Features are extracted, such as speaker changes and applause scene detection using techniques.

映像特徴抽出部１４２または音声特徴抽出部１４３で抽出された特徴量は、メタデータ生成部１２０に供給されて分析され、メタデータ生成部１２０で、ビデオショット重要度レベル３４ｃが算出される。
例えば音声特徴抽出部１４３で得られた特徴量として、音声認識手法によって拍手や歓声と識別された音声信号が長時間継続した場合、その継続時間に応じた数値を重要度として割り当てる。また映像特徴抽出部１４２で得られた得た特徴量として、動きベクトル量が大きな箇所、すなわち動きが激しい映像信号があった場合、その動きベクトル量の度合いに応じた数値を重要度として割り当てる。当該重要度はビデオショット重要度レベル３４ｃ（図３（Ｅ））となり、また特徴量が観測された箇所の再生時間情報はビデオショット開始時間情報３４ａおよびビデオショット終了時間情報３４ｂ（図３（Ｅ））となる。
以上の処理を繰り返して複数のビデオショットエントリ３３ｂ−１乃至３３ｂ−Ｍを生成してメタデータ３０を形成し、これをメタデータファイル２６としてバッファメモリ１１５を介してハードディスクドライブ１０２のメタデータディレクトリ２２など所定の論理位置に配置されるように書き込む。 The feature amount extracted by the video feature extraction unit 142 or the audio feature extraction unit 143 is supplied to the metadata generation unit 120 and analyzed, and the metadata generation unit 120 calculates the video shot importance level 34c.
For example, as a feature amount obtained by the speech feature extraction unit 143, when a speech signal identified as applause or cheer by a speech recognition technique continues for a long time, a numerical value corresponding to the duration is assigned as the importance. In addition, as a feature amount obtained by the video feature extraction unit 142, when there is a portion having a large motion vector amount, that is, a video signal with intense motion, a numerical value corresponding to the degree of the motion vector amount is assigned as an importance level. The importance is the video shot importance level 34c (FIG. 3E), and the reproduction time information of the portion where the feature amount is observed is the video shot start time information 34a and the video shot end time information 34b (FIG. 3E). )).
The above processing is repeated to generate a plurality of video shot entries 33b-1 to 33b-M to form metadata 30, which is used as a metadata file 26 via the buffer memory 115 and the metadata directory 22 of the hard disk drive 102. For example, writing is performed so as to be arranged at a predetermined logical position.

メタデータファイル２６の生成は、マルチメディアデータファイル２４の記録に対して必ずしもリアルタイムで同時に処理されなくても良く、メタデータ生成部１２０またはシステム制御部１０１のメモリ領域に少なくともビデオショット重要度レベル３４ｃを含む必要データを保持しておき、後からメタデータファイル２６を生成して、ハードディスクドライブ１０２に記録する方法であっても良い。
また、ビデオショット重要度レベル３４ｃを含む必要データを一旦ハードディスクドライブ１０２に記録し、後に（後述のダビングの前に）ハードディスクドライブ１０２から読み出して、メタデータファイル２６を生成し、生成されたメタデータファイル２６をハードディスクドライブ１０２に書込んでおくようにしても良い。 The generation of the metadata file 26 does not necessarily have to be simultaneously processed in real time for the recording of the multimedia data file 24, and at least the video shot importance level 34c is stored in the memory area of the metadata generation unit 120 or the system control unit 101. Alternatively, a method may be used in which necessary data including is stored, the metadata file 26 is generated later, and recorded in the hard disk drive 102.
Further, necessary data including the video shot importance level 34c is once recorded in the hard disk drive 102, and later read out (before dubbing described later) from the hard disk drive 102 to generate the metadata file 26, and the generated metadata The file 26 may be written in the hard disk drive 102.

さらに、本実施の形態１におけるメタデータファイル２６は、そのデータ形式は問わず、テキスト形式であってもバイナリ形式であっても構わない。また第三者による改ざんや情報の流出を阻む為に暗号化処理を施すこととしても良い。 Further, the metadata file 26 according to the first embodiment is not limited to the data format, and may be a text format or a binary format. In addition, encryption processing may be performed to prevent tampering by third parties and leakage of information.

次に、図１から図５を用いて、実施の形態１の動画像符号化装置１００におけるダビング処理の過程を説明する。ここではハードディスクドライブ１０２に記録されているマルチメディアデータファイル２４内のＶＯＢを、光ディスク１０３へ、リエンコードダビングする際の処理について記載するが、もちろん光ディスク１０３からハードディスクドライブ１０２へダビングしても構わない。 Next, a dubbing process in the moving picture coding apparatus 100 according to the first embodiment will be described with reference to FIGS. Here, the processing when re-encoding dubbing the VOB in the multimedia data file 24 recorded on the hard disk drive 102 to the optical disk 103 will be described, but of course the dubbing from the optical disk 103 to the hard disk drive 102 may be performed. .

まず図１及び図２を用いて、一般的なリエンコードダビング処理(通常のリエンコーディングと同様の処理）の概要について記載する。
ハードディスクドライブ１０２に記録された所望のＶＯＢ（番組）を、光ディスク１０３へリエンコードダビングを行う場合、まずハードディスクドライブ１０２内にマルチメディアデータファイル２４として記録されている番組を、デマルチプレクサ１３０、ビデオデコーダ１３１、オーディオデコーダ１３３により、復号化する。その後、復号化された出力映像信号１３２および出力音声信号１３４は、ビデオエンコーダ１１２、オーディオエンコーダ１１３により、符号化され、マルチプレクサ１１４により多重化され、バッファメモリ１１５、記録再生ドライブ１０４を経由し、マルチメディアデータファイル２４として、光ディスク１０３に書き込まれる。 First, an outline of general re-encoding dubbing processing (processing similar to normal re-encoding) will be described with reference to FIGS. 1 and 2.
When re-encoding dubbing of a desired VOB (program) recorded in the hard disk drive 102 to the optical disk 103, first, the program recorded as the multimedia data file 24 in the hard disk drive 102 is demultiplexed by the demultiplexer 130 and the video decoder. 131, the audio decoder 133 decodes. Thereafter, the decoded output video signal 132 and output audio signal 134 are encoded by the video encoder 112 and the audio encoder 113, multiplexed by the multiplexer 114, passed through the buffer memory 115 and the recording / reproducing drive 104, The media data file 24 is written on the optical disc 103.

次に図１から図５を用いて、実施の形態１のダビング処理過程を詳細に記載する。まずハードディスクドライブ１０２内の当該番組を構成するプログラム４１および当該プログラム４１を構成するセル４２がプログラムチェーン情報４０によって分かるので、参照すべきＶＯＢの番号および当該セルの再生開始時間と再生終了時間の各プレゼンテーションタイム（ＰＴＭ）が確定する。 Next, the dubbing process of the first embodiment will be described in detail with reference to FIGS. First, since the program 41 constituting the program in the hard disk drive 102 and the cell 42 constituting the program 41 can be known from the program chain information 40, each of the number of the VOB to be referred to and the reproduction start time and the reproduction end time of the cell are referred to. Presentation time (PTM) is fixed.

動画像符号化装置１００は、視聴者がダビング処理を指示する前、またはダビング対象として所望の番組を選択した後のいずれかのタイミングにおいてハードディスクドライブ１０２に記録されたメタデータファイル２６を読み出してバッファメモリ１１５を介してメタデータ分析部１２１に取り込むことで、メタデータ３０に記述したデータ構造を適宜参照できるようにしておく。 The moving image encoding apparatus 100 reads out and buffers the metadata file 26 recorded in the hard disk drive 102 at any timing before the viewer instructs the dubbing process or after selecting a desired program as a dubbing target. By importing into the metadata analysis unit 121 via the memory 115, the data structure described in the metadata 30 can be referred to as appropriate.

ここではメタデータファイル２６とマルチメディアデータファイル２４とは独立した論理ファイルで構成する例について説明しているが、例えばメタデータ３０は情報管理ファイル２３のデータ構造中に記述されていたり、マルチメディアデータファイル２４に多重化して記述されていたりしても良い。 Here, an example in which the metadata file 26 and the multimedia data file 24 are configured as independent logical files has been described. For example, the metadata 30 is described in the data structure of the information management file 23, or the multimedia file 26 The data file 24 may be multiplexed and described.

なおメタデータファイル２６はマルチメディアデータファイル２４と独立した論理ファイルで構成することで、マルチメディアデータファイル２４を全て読み出す必要が無く、メタデータファイル２６のみを読み出して解析するだけで番組の重要なシーン箇所を素早く検出することができる。 The metadata file 26 is composed of a logical file independent of the multimedia data file 24, so that it is not necessary to read all the multimedia data file 24, and only the metadata file 26 is read and analyzed. The scene part can be detected quickly.

メタデータ分析部１２１では、メタデータ３０に記述されているデータ構造から、ダビング対象の番組に関して、ビデオショット毎のビデオショット開始時間情報３４ａ、ビデオショット終了時間情報３４ｂ、ビデオショット重要度レベル３４ｃを読み出す。これらの情報から、各単位時間における重要度レベルを取得することができるため、図５（ａ）に示す重要度マップを生成できる。 The metadata analysis unit 121 obtains the video shot start time information 34a, the video shot end time information 34b, and the video shot importance level 34c for each video shot for the program to be dubbed from the data structure described in the metadata 30. read out. Since the importance level in each unit time can be acquired from these pieces of information, the importance map shown in FIG. 5A can be generated.

その後、メタデータ分析部１２１では、得られた重要度マップ（図５（ａ））より、ダビング時のレート設定の参照値である符号化レートマップ（図５（ｂ））を生成する。符号化レートマップ（図５（ｂ））は、ダビングによって再符号化する際に記録レート設定の参照値を示すものであり、重要度マップ（図５（ａ））の重要度レベルが高いシーンには、符号化レートマップ（図５（ｂ））に高い符号化レート値を設定する。 Thereafter, the metadata analysis unit 121 generates an encoded rate map (FIG. 5B), which is a reference value for rate setting at the time of dubbing, from the obtained importance map (FIG. 5A). The encoding rate map (FIG. 5B) shows a reference value for recording rate setting when re-encoding by dubbing, and a scene with a high importance level in the importance map (FIG. 5A). In this case, a high encoding rate value is set in the encoding rate map (FIG. 5B).

重要度マップ（図５（ａ））から符号化レートマップ（図５（ｂ））を生成する際には、数値変換が必要となる。例えば、重要度レベルは、[０．０〜１．０]といった範囲で設定されるのに対して、一般的に符号化レート設定値は[３Ｍｂｐｓ〜８Ｍｂｐｓ]といった範囲で設定される。数値の次元および設定範囲が異なるため、メタデータ分析部１２１では、符号化レート上限値５８から符号化レート下限値５９の範囲内で、重要度レベルを反映した符号化レートマップ（図５（ｂ））を設計しなければならない。 When generating the encoding rate map (FIG. 5B) from the importance map (FIG. 5A), numerical conversion is required. For example, the importance level is set in a range of [0.0 to 1.0], while the encoding rate setting value is generally set in a range of [3 Mbps to 8 Mbps]. Since the dimension and setting range of the numerical values are different, the metadata analysis unit 121 encodes an encoding rate map that reflects the importance level within the range of the encoding rate upper limit value 58 to the encoding rate lower limit value 59 (FIG. 5B). )) Must be designed.

あらかじめ動画像符号化装置１００では、符号化レート変換上限値５８と符号化レート変換下限値５９が設定されているものとして説明を進めるが、これらの設定値は、ビデオエンコーダ１１２のエンコード可能な範囲内において、ユーザーが手動で決定しても良いし、動画像符号化装置１００が自動的に決定しても良い。 In the moving image encoding apparatus 100, the description will be given assuming that the encoding rate conversion upper limit value 58 and the encoding rate conversion lower limit value 59 are set in advance, but these setting values are within a range that can be encoded by the video encoder 112. In this, the user may determine manually or the moving picture encoding apparatus 100 may determine automatically.

符号化レート設定グラフ５７の値を決定する際には、重要度マップ（図５（ａ））で規定されている重要度上限値５３から重要度下限値５４の範囲を、符号化レート変換上限値５８から符号化レート変換下限値５９にスケールを変換する形態などで値を確定する。 When determining the value of the encoding rate setting graph 57, the range from the importance upper limit 53 to the importance lower limit 54 defined in the importance map (FIG. 5A) is set to the encoding rate conversion upper limit. The value is determined by converting the scale from the value 58 to the encoding rate conversion lower limit 59.

例えば、図５（ａ）中で、重要度上限値５３は１．０の値を、重要度下限値５４は０．０の値が設定しているものとする。また符号化レートマップ（図５（ｂ））の符号化レート変換上限値５８には８．０Ｍｂｐｓが、符号化レート変換下限値５９には３．０Ｍｂｐｓが設定しているものとする。その場合、重要度レベル範囲である０．０から１．０の範囲が、符号化レートマップの３．０Ｍｂｐｓから８．０Ｍｂｐｓまでの範囲に射影するなどといった形態でスケール変更が行われる。すなわちビデオショット[ｉ]における記録レート設定値（Ｘ[ｉ]）は、「ビデオショット［ｉ］の重要度レベル×（符号化レート上限値５８−符号化レート下限値５９）／（重要度レベル上限値５３−重要度レベル下限値５４）などの算式によって求めることができる。上述した算式により、例えば、重要度レベルが０．８のシーンの場合、符号化レート設定値は７．０Ｍｂｐｓに変換されることとなる。 For example, in FIG. 5A, it is assumed that the importance upper limit 53 is set to a value of 1.0, and the importance lower limit 54 is set to a value of 0.0. Further, it is assumed that 8.0 Mbps is set in the encoding rate conversion upper limit value 58 of the encoding rate map (FIG. 5B) and 3.0 Mbps is set in the encoding rate conversion lower limit value 59. In this case, the scale is changed in such a manner that the range of importance level 0.0 to 1.0 is projected onto the range of 3.0 Mbps to 8.0 Mbps in the coding rate map. That is, the recording rate setting value (X [i]) in video shot [i] is “importance level of video shot [i] × (encoding rate upper limit value 58−encoding rate lower limit value 59) / (importance level). For example, in the case of a scene with an importance level of 0.8, the encoding rate set value is converted to 7.0 Mbps. Will be.

図５では、符号化レートマップ（同図（ｂ））の符号化レート変換上限値５８と符号化レート変換下限値５９の範囲を狭く設定し、特に下限値を大きく引き上げた場合について示した。同図のように符号化レート変換上限値５８と符号化レート変換下限値５９を設定すると、符号化レート設定グラフ５７は同図中のように縦方向に圧縮した設定値を持つ曲線となる。また符号化レート変換下限値５９を大きく引き上げているため、高いビットレートで遷移する符号化レート設定グラフ５７となる。 FIG. 5 shows a case where the range of the encoding rate conversion upper limit value 58 and the encoding rate conversion lower limit value 59 in the encoding rate map ((b) in the same figure) is set narrow, and the lower limit value is particularly greatly increased. When the encoding rate conversion upper limit value 58 and the encoding rate conversion lower limit value 59 are set as shown in the figure, the encoding rate setting graph 57 becomes a curve having setting values compressed in the vertical direction as shown in the figure. Since the encoding rate conversion lower limit 59 is greatly increased, the encoding rate setting graph 57 transitions at a high bit rate.

このように、符号化レート変換上限値５８と符号化レート変換下限値５９の設定値を変更することにより、符号化レート設計時において、重要度レベルの影響度や符号化レートの平均値を操作することができる。これにより、図５に示すような重要度レベルを縦方向に圧縮した場合には、重要度レベルがダイナミックが大幅に変化する場合であっても、重要度レベルの影響が少ない符号化レート値が設定できる。逆に、重要度レベルを縦方向に伸張した場合は、重要度レベルのちの変化の幅が小さい場合であっても、符号化レートにダイナミックに反映することができる。 In this way, by changing the setting values of the encoding rate conversion upper limit value 58 and the encoding rate conversion lower limit value 59, the influence level of the importance level and the average value of the encoding rate can be manipulated at the time of encoding rate design. can do. As a result, when the importance level as shown in FIG. 5 is compressed in the vertical direction, even if the importance level changes greatly in dynamic, an encoding rate value that is less affected by the importance level is obtained. Can be set. On the other hand, when the importance level is extended in the vertical direction, even if the width of change after the importance level is small, it can be dynamically reflected in the coding rate.

なお符号化レートマップ（図５（ｂ））を作成する際には、上述した物理変化量を元に視聴価値の有無を示す情報を生成し、これに基づいてレート設計をおこなっても良いし、映像の物理変化量（動きベクトルやカラーヒストグラムなど）と組み合わせて符号化レート設計を行っても良い。そのように構成することで、画像符号化困難度の影響を加味した効率的な符号化を行うことができる。 When creating the coding rate map (FIG. 5B), information indicating the presence / absence of viewing value may be generated based on the physical change amount described above, and the rate design may be performed based on this information. The coding rate may be designed in combination with the physical change amount of the video (such as a motion vector or a color histogram). With such a configuration, it is possible to perform efficient encoding in consideration of the influence of the image encoding difficulty level.

このようにメタデータ分析部１２１によって、符号化レートマップ（図５（ｂ））を作成できる。ここで生成した符号化レートマップ（図５（ｂ））は、記録レートマップ保持部１２２に格納する。 In this way, the metadata analysis unit 121 can create an encoding rate map (FIG. 5B). The encoding rate map generated here (FIG. 5B) is stored in the recording rate map holding unit 122.

ダビング対象ＶＯＢに関して、符号化レートマップ（図５（ｂ））が確定した段階で、実際のダビング処理を開始する。ハードディスクドライブ１０２から読み出されたマルチメディアデータファイル２４内に保持しているＶＯＢは、バッファメモリ１１５を経由し、デマルチプレクサ１３０に供給される。デマルチプレクサ１３０では、マルチメディアデータを映像ストリームと音声ストリームに分離し、それぞれビデオデコーダ１３１およびオーディオデコーダ１３３に供給し、符号化された映像ストリームと音声ストリームを復号化する。 For the dubbing target VOB, the actual dubbing process is started when the encoding rate map (FIG. 5B) is determined. The VOB held in the multimedia data file 24 read from the hard disk drive 102 is supplied to the demultiplexer 130 via the buffer memory 115. The demultiplexer 130 separates the multimedia data into a video stream and an audio stream and supplies them to the video decoder 131 and the audio decoder 133, respectively, and decodes the encoded video stream and audio stream.

復号化された出力映像信号１３２と出力音声信号１３４は、ビデオエンコーダ１１２およびオーディオエンコーダ１１３に供給される。その際に、記録レート変更手段１２３は、記録レートマップ保持部１２２に保持している符号化レートマップ（図５（ｂ））から、映像シーン毎の符号化レートの参照値を用いて、ビデオエンコーダ１１２とオーディオエンコーダ１１３の符号量設定を行う。そのため、シーンの意味的重要度に応じて、適切に符号量が配分された映像ストリームおよび音声ストリームが生成できる。その後は、通常の録画と同様に、マルチプレクサ１１４で多重化され、バッファメモリ１１５、記録再生ドライブ１０４を経由し、光ディスク１０３にマルチメディアデータファイル２４として記録される。また光ディスク１０３にマルチメディアデータファイル２４を書き込む際には、バッファメモリ１１５に保持しているメタデータファイル２６を一緒に書き込んでも構わない。 The decoded output video signal 132 and output audio signal 134 are supplied to the video encoder 112 and the audio encoder 113. At that time, the recording rate changing unit 123 uses the encoding rate reference value for each video scene from the encoding rate map (FIG. 5B) stored in the recording rate map storage unit 122. The code amount of the encoder 112 and the audio encoder 113 is set. Therefore, it is possible to generate a video stream and an audio stream in which code amounts are appropriately allocated according to the semantic importance of the scene. Thereafter, as in normal recording, the data is multiplexed by the multiplexer 114 and recorded as the multimedia data file 24 on the optical disk 103 via the buffer memory 115 and the recording / reproducing drive 104. When writing the multimedia data file 24 to the optical disk 103, the metadata file 26 held in the buffer memory 115 may be written together.

実施の形態２．
図６は、実施の形態２に示す符号化レートマップ修正の一例の説明図である。図６において、６０は符号化レート修正前の符号化レート設定情報を保持する符号化レート設定グラフ（修正前）、６１は符号化レート設定グラフ（修正前）６０全体の平均符号化レートを示す平均符号化レート（修正前）を示す。また６２は符号化レート修正後の符号化レート設定情報を保持する符号化レート設定グラフ（修正後）、６３は符号化レート修正を行った後の平均符号化レートを示す目標平均符号化レートを示す。 Embodiment 2. FIG.
FIG. 6 is an explanatory diagram of an example of the coding rate map correction shown in the second embodiment. In FIG. 6, reference numeral 60 denotes an encoding rate setting graph (before correction) that holds encoding rate setting information before correction of the encoding rate, and 61 denotes an average encoding rate of the entire encoding rate setting graph (before correction) 60. The average coding rate (before correction) is shown. Reference numeral 62 denotes a coding rate setting graph (after correction) that holds the coding rate setting information after correction of the coding rate, and 63 denotes a target average coding rate that indicates the average coding rate after the coding rate is corrected. Show.

図７は、実施の形態２における符号化レートマップ修正の他の例の説明図である。図７中で、７０は符号化レート修正前の符号化レート設定情報を保持する符号化レート設定グラフ（修正前）、７１は符号化レート設定グラフ（修正前）７０全体の平均符号化レートを示す平均符号化レート（修正前）を示す。また７２は符号化レート修正後の符号化レート設定情報を保持する符号化レート設定グラフ（修正後）、７３は符号化レート修正を行った後の平均符号化レートを示す目標平均符号化レート、７４はビデオエンコーダ１１２が設定可能な記録レートの上限値などを示す符号化レート上限値である。 FIG. 7 is an explanatory diagram of another example of the coding rate map correction in the second embodiment. In FIG. 7, reference numeral 70 denotes an encoding rate setting graph (before correction) that holds encoding rate setting information before correction of the encoding rate, and 71 denotes an average encoding rate of the entire encoding rate setting graph (before correction) 70. The average coding rate shown (before correction) is shown. Reference numeral 72 denotes an encoding rate setting graph (after correction) that holds the encoding rate setting information after correction of the encoding rate, 73 denotes a target average encoding rate that indicates the average encoding rate after the encoding rate correction, Reference numeral 74 denotes an encoding rate upper limit value indicating an upper limit value of a recording rate that can be set by the video encoder 112.

なお図６、図７において、実施の形態１において説明した構成と同様の構成については同一の符号を付記する。実施の形態２においては、ダビング先の記録媒体の容量に応じて、符号化レートマップの情報を再修正する方法について記載する。その他の処理は、実施の形態１と同様の処理を行う。 6 and 7, the same reference numerals are given to the same configurations as those described in the first embodiment. In the second embodiment, a method for re-correcting the information of the coding rate map according to the capacity of the dubbing destination recording medium will be described. Other processes are the same as those in the first embodiment.

図１、図２、図３、図４、及び図６を用いて、実施の形態２における符号化レートマップの記録レート修正の一例（図６）を詳細に説明する。実施の形態２においては、ダビング先空き容量に応じて、符号化レートマップを再修正することを特徴としている。 An example (FIG. 6) of correcting the recording rate of the coding rate map in the second embodiment will be described in detail with reference to FIG. 1, FIG. 2, FIG. 3, FIG. 4, and FIG. The second embodiment is characterized in that the encoding rate map is re-corrected according to the dubbing destination free capacity.

通常、光ディスクなどの固定容量である記録媒体へダビングを行う場合、ダビング先記録媒体の空き容量に対して適切な平均レートで符号化しなければならない。なぜならばダビング先の空き容量を超過する記録レートで符号化されたマルチメディアデータファイルは、最終的にダビング先記録媒体の空き容量に収録することができないからである。一方、記録レートを低く設定しすぎると、記録メディアを有効に使用することができず、高画質・高音質で記録することができなくなる。 Normally, when dubbing to a recording medium having a fixed capacity such as an optical disk, it is necessary to encode at an appropriate average rate with respect to the free capacity of the dubbing destination recording medium. This is because a multimedia data file encoded at a recording rate that exceeds the free capacity of the dubbing destination cannot be finally recorded in the free capacity of the dubbing destination recording medium. On the other hand, if the recording rate is set too low, the recording medium cannot be used effectively and recording with high image quality and high sound quality cannot be performed.

ダビング開始前に、あらかじめ記録レートマップ保持部１２２には、ダビング対象マルチメディアデータファイル２４中に保持されているＶＯＢに関しての符号化レートマップ（図６）を保持しているものとする。ダビング実施時において、システム制御部１０１は、ダビング先記録媒体である光ディスク１０３の空き容量を取得する。そして、マルチメディアデータファイル２４中からダビング対象ＶＯＢの記録時間と、ダビング先空き容量から光ディスク１０３の空き容量に納まるサイズで、最も大きくビットレートを割り振ることができる目標平均符号化レート６３を決定する。 Before starting dubbing, it is assumed that the recording rate map holding unit 122 holds a coding rate map (FIG. 6) regarding the VOB held in the dubbing target multimedia data file 24 in advance. At the time of dubbing, the system control unit 101 acquires the free capacity of the optical disc 103 that is a dubbing destination recording medium. Then, a target average encoding rate 63 capable of allocating the largest bit rate is determined from the recording time of the dubbing target VOB in the multimedia data file 24 and the size that fits in the free capacity of the optical disc 103 from the dubbing destination free capacity. .

図６を用いて、目標平均符号化レート６３の算出方法について具体的に説明する。平均符号化レート（修正前）６１が６．０Ｍｂｐｓ、ダビング先記録媒体空き容量が４．５ＧＢ（ギガバイト）、マルチメディアデータファイル２４中のダビング対象ＶＯＢの再生時間が２時間と仮定して説明を進める。目標平均符号化レート６３は、ダビング先記録媒体の空き容量を記録時間で割ることで算出することができる。すなわち目標平均符号化レート６３は、（４．５Ｇ×８バイト）／（２時間×６０分×６０秒）＝５．０Ｍｂｐｓと算出される。
この計算は、システム制御部１０１で行われ、計算の結果得られた目標平均符号化レート６３がシステム制御部１０１から記録レート変更手段１２３に伝えられる。 A method of calculating the target average coding rate 63 will be specifically described with reference to FIG. The explanation is based on the assumption that the average encoding rate (before correction) 61 is 6.0 Mbps, the dubbing destination recording medium free space is 4.5 GB (gigabytes), and the playback time of the dubbing target VOB in the multimedia data file 24 is 2 hours. Proceed. The target average encoding rate 63 can be calculated by dividing the free capacity of the dubbing destination recording medium by the recording time. That is, the target average encoding rate 63 is calculated as (4.5 G × 8 bytes) / (2 hours × 60 minutes × 60 seconds) = 5.0 Mbps.
This calculation is performed by the system control unit 101, and the target average encoding rate 63 obtained as a result of the calculation is transmitted from the system control unit 101 to the recording rate changing unit 123.

平均符号化レート６１で設定された記録レートのまま符号化を行うと、ダビング先記録媒体である光ディスク１０２の容量を超過するため、符号化レート設定曲線６０の平均レートが、目標平均符号化レート６３と等しくなるように全体を引き下げる必要がある。符号化レート設定曲線６０は、全体を下方向にシフトすることにより符号化レート設定曲線６２のように修正する。上述したように、全体を均等にシフトさせても構わないし、重要度に応じて変化割合に重み付けをしながらレート設計をしても構わない。この修正は、記録レート変更手段１２３で行われる。即ち、記録レート変更手段１２３では、システム制御部１０１から供給される目標平均符号化レート６３を用いて、記録レートマップ保持部１２２に保持されている記録レートを修正する。 If encoding is performed with the recording rate set at the average encoding rate 61 being exceeded, the capacity of the optical disc 102 as the dubbing destination recording medium is exceeded, so the average rate of the encoding rate setting curve 60 is the target average encoding rate. It is necessary to pull down the whole so that it becomes equal to 63. The encoding rate setting curve 60 is modified like the encoding rate setting curve 62 by shifting the whole downward. As described above, the whole may be shifted equally, or the rate may be designed while weighting the rate of change according to the importance. This correction is performed by the recording rate changing means 123. That is, the recording rate changing unit 123 corrects the recording rate held in the recording rate map holding unit 122 using the target average encoding rate 63 supplied from the system control unit 101.

本例では、ダビング先の空き容量に応じて、自動的に平均レートの再設定を行う手法について示したが、手動で目標平均符号化レート６３を決定しても構わない。 In this example, the method of automatically resetting the average rate according to the free space at the dubbing destination has been described, but the target average encoding rate 63 may be determined manually.

次に、図１、図２、図３、図４、及び図７を用いて、実施の形態２における符号化レートマップの記録レート修正の他の例（図７）を詳細に説明する。本例では、ダビング先空き容量が十分に大きい場合に、目標平均符号化レートを引き上げる場合について記載する。これにより、より高品質な動画像にてディスク記録容量を効率的に使用した符号化を行うことができる。 Next, another example (FIG. 7) of correcting the recording rate of the coding rate map in the second embodiment will be described in detail with reference to FIG. 1, FIG. 2, FIG. 3, FIG. This example describes a case where the target average encoding rate is raised when the dubbing destination free capacity is sufficiently large. As a result, it is possible to perform encoding using the disk recording capacity efficiently with higher quality moving images.

実施の形態２における符号化レートマップの記録レート修正の一例（図６）で説明したように、システム制御部１０１及び記録レート変更手段１２３の動作により、符号化レート設定グラフ（修正前）７０を、目標平均符号化レート７３に従って、符号化レート設定グラフ（修正後）７２に修正することができる。しかし上述した符号化レート設定の修正を行うと、ある映像シーンの符号化レートが符号化レート上限値７４を超える場合がある。その場合には、符号化レート上限値７４を超える映像シーンの符号化レートを調整して、符号化レート上限値７４以内におさめる必要がある。そこで本例では、修正後の符号化レート値が、符号化レート上限値７４を超えないように、再度符号化レート曲線７２を修正する処理を行っている。 As described in the example of the recording rate correction of the encoding rate map in the second embodiment (FIG. 6), the encoding rate setting graph (before correction) 70 is displayed by the operation of the system control unit 101 and the recording rate changing unit 123. According to the target average encoding rate 73, the encoding rate setting graph (after correction) 72 can be corrected. However, when the encoding rate setting is corrected as described above, the encoding rate of a certain video scene may exceed the encoding rate upper limit 74. In that case, it is necessary to adjust the encoding rate of the video scene that exceeds the encoding rate upper limit value 74 so as to be within the encoding rate upper limit value 74. Therefore, in this example, the process of correcting the encoding rate curve 72 is performed again so that the corrected encoding rate value does not exceed the encoding rate upper limit 74.

符号化レート設定グラフ（修正後）７２に関して、符号化レート上限値７４を超えるような映像シーンでは、図７に示すように符号化レート上限値７４と等価の値に設定しなおす。この操作によって過剰となった符号量は、修正を行わなかった他の映像シーンに再分配し、全体の符号量は変更しないように調整する。このような上限値７４による制限を含む修正も記録レート変更手段１２３で行われる。そのため、記録レート変更手段１２３には、上限値７４を表すデータがシステム制御部１０１から供給され、記憶されている。 With respect to the encoding rate setting graph (after correction) 72, in a video scene that exceeds the encoding rate upper limit 74, it is reset to a value equivalent to the encoding rate upper limit 74 as shown in FIG. The code amount that has become excessive by this operation is redistributed to other video scenes that have not been corrected, and the overall code amount is adjusted so as not to change. The recording rate changing means 123 also performs correction including such limitation by the upper limit value 74. Therefore, data representing the upper limit value 74 is supplied from the system control unit 101 and stored in the recording rate changing unit 123.

本例では、符号化レート上限値７４が存在する場合について記載しているが、同様に記録レート下限値があった場合にも適用しても良い。 In this example, the case where the encoding rate upper limit 74 exists is described, but the present invention may also be applied to the case where there is a recording rate lower limit.

このように構成することで、ビデオエンコーダ１１２が符号化を実施できる範囲内で、符号化レートを制御することができる。本実施の形態２ではビデオエンコーダ１１２のハードウェア特性の制約により、符号化レート上限値７４を決定しているが、光ディスク１０３に記録するアプリケーション規格の制約に基づいて、符号化レート上限値７４が決定されても良い。 With this configuration, the encoding rate can be controlled within a range in which the video encoder 112 can perform encoding. In the second embodiment, the encoding rate upper limit value 74 is determined due to the restriction of the hardware characteristics of the video encoder 112. However, the encoding rate upper limit value 74 is determined based on the restriction of the application standard to be recorded on the optical disc 103. It may be determined.

また、ダビング先の記録メディアの容量に合わせて、重要度から算出されたリエンコード時の符号化レート参照値を再修正することができるため、ダビング先メディア容量に合わせた効率的な符号量制御を実施したダビングを行うことができる。 Also, since the encoding rate reference value at the time of re-encoding calculated from the importance can be re-corrected according to the capacity of the recording medium of the dubbing destination, efficient code amount control according to the capacity of the dubbing destination medium Dubbing can be performed.

実施の形態３．
図８は、実施の形態３における要約ダビング概念図を示したものである。同図において、８０は重要度レベルグラフ８１を評価するための値である閾値、８２は閾値８０に基づいて特定の値以下のシーンの重要度レベルを修正した後の重要度レベルグラフ、８３は閾値８０によって映像シーンのダビング対象区間とスキップ対象区間を示す。なお上記以外については、実施の形態１で説明したものと同じであり、ここでは説明を省略する。 Embodiment 3 FIG.
FIG. 8 shows a conceptual diagram of summary dubbing in the third embodiment. In the figure, 80 is a threshold value that is a value for evaluating the importance level graph 81, 82 is an importance level graph after correcting the importance level of a scene below a specific value based on the threshold 80, 83 is A threshold 80 indicates a dubbing target section and a skip target section of the video scene. Other than the above, the configuration is the same as that described in the first embodiment, and the description thereof is omitted here.

実施の形態３においては、図８（ａ）に示す重要度マップに従い、ダビング実施区間を決定することを特徴としている。具体的には重要度マップ中で、ある閾値以下の値が設定されていた場合、当該区間のダビングを行わず、次のダビング区間開始点にジャンプしダビングを継続する。 The third embodiment is characterized in that the dubbing execution section is determined according to the importance map shown in FIG. Specifically, when a value equal to or less than a certain threshold is set in the importance map, the dubbing of the section is not performed, and the jump to the start point of the next dubbing section is continued and the dubbing is continued.

図８を用いて、実施の形態３の動画像符号化装置１００のダビング制御シーケンスについて説明する。ダビング対象のＶＯＢに関連するビデオショットのビデオショット重要度レベル３４ｃが重要度レベルグラフ８１となる場合、閾値８０を下回るビデオショットはａ１からａ２の区間、およびｂ１からｂ２の区間となっている。メタデータ分析部１２１もしくはシステム制御部１０１では、当該区間において重要度レベルを特定の値（例えば、図８では０を示す）に再設定した重要度レベルグラフ８２のように再設計を行う。その後、ダビングを行う際には、重要度レベルが特定の値（例えば、０）を示している区間はダビングを実施せず、次のダビング対象区間の開始点にジャンプし、ダビングを継続する。なお実施の形態１と同様の方法で、ダビング処理を行うものとする。重要度レベルが特定の値（例えば０）である場合、記録レート変更手段１２３から符号化レートがゼロであることを表すデータが出力され、この結果ビデオデコーダ１１２及びオーディオデコーダ１１３では符号化が行われず、結果として、ダビングがスキップされる。 A dubbing control sequence of the moving picture coding apparatus 100 according to Embodiment 3 will be described with reference to FIG. When the video shot importance level 34c of the video shot related to the VOB to be dubbed becomes the importance level graph 81, the video shots below the threshold 80 are in the section from a1 to a2 and in the section from b1 to b2. The metadata analysis unit 121 or the system control unit 101 performs redesign as in the importance level graph 82 in which the importance level is reset to a specific value (for example, 0 is shown in FIG. 8) in the section. Thereafter, when dubbing is performed, the section where the importance level shows a specific value (for example, 0) is not performed, but jumps to the start point of the next section to be dubbed and continues dubbing. Note that the dubbing process is performed in the same manner as in the first embodiment. When the importance level is a specific value (for example, 0), data indicating that the encoding rate is zero is output from the recording rate changing unit 123. As a result, the video decoder 112 and the audio decoder 113 perform encoding. As a result, dubbing is skipped.

このようにダビング処理をスキップするよう構成することにより、リエンコードダビング時に、視聴価値の低い部分を除外したマルチメディアデータファイルを生成することができる。例えば、テレビジョン放送におけるＣＭ（コマーシャルメッセージ）区間など、番組本編と関連性が低い部分での重要度レベルが低く設定されるように重要度レベルグラフが作成されている場合には、当該重要度レベルを参照することでＣＭ区間を除外して番組本編部分だけを選択的にダビングすることができる。 By configuring so that the dubbing process is skipped in this way, it is possible to generate a multimedia data file excluding a portion with low viewing value at the time of re-encoding dubbing. For example, when the importance level graph is created so that the importance level is set to be low in a portion that is not related to the main program such as a CM (commercial message) section in television broadcasting, the importance level By referring to the level, it is possible to selectively dub only the main part of the program without the CM section.

なお、上記の実施の形態では、ダビングを行う方法及び装置に関するものであるが、一つの記録媒体の一つの領域から読み出して、復号化、符号化を行って、同じ記録媒体の別の領域に記録する場合、又は同じ記録媒体の同じ領域に記録する（上書きする）場合にも本発明を適用することができる。 The above embodiment relates to a method and an apparatus for performing dubbing. However, it is read from one area of one recording medium, decoded and encoded, and transferred to another area on the same recording medium. The present invention can also be applied when recording, or when recording (overwriting) in the same area of the same recording medium.

また、上記の実施の形態では、映像の特徴と、音声の特徴の双方に基づいて、符号量の割当てを行っており、このようにすることにより、映像の内容的な重要さの度合いや、画像符号化困難度の影響を加味した効率的な符号化を行うことができるが、このようにする代わりに、音声の特徴の抽出のみを行って、これに基づいて、符号量の割当てを行うようにしても良い。
音声の特徴のみによって符号量の割当てを行う構成とすれば、情報量が比較的小さい音声の特徴を表すデータから、再符号化時の符号化レートを決定する指標を生成できるため、回路規模が小さくても特徴情報を反映した再符号化を行うことができる。
さらに、上記の実施の形態では、特徴量の抽出を、１以上のフレームから成るシーン乃至はビデオショットのように所定の時間区分毎に行っているが、所定のデータ量から成るセグメント（データ量により定義された区分）毎に特徴量の抽出を行うようにしても良い。 In the above embodiment, the code amount is allocated based on both the video characteristics and the audio characteristics. By doing so, the degree of importance of the content of the video, Efficient encoding can be performed in consideration of the effect of image encoding difficulty, but instead of doing this, only extraction of audio features is performed, and code amount is allocated based on this. You may do it.
If the configuration is such that the amount of code is allocated only by the features of the speech, an index for determining the coding rate at the time of re-encoding can be generated from the data representing the features of the speech with a relatively small amount of information, so the circuit scale is large. Even if it is small, re-encoding reflecting the feature information can be performed.
Furthermore, in the above embodiment, the feature amount is extracted for each predetermined time segment such as a scene or video shot made up of one or more frames, but a segment (data amount made up of a predetermined amount of data). The feature amount may be extracted for each of the categories defined by (1).

この発明の実施の形態１を示すシステム構成ブロック図である。BRIEF DESCRIPTION OF THE DRAWINGS It is a system configuration block diagram showing Embodiment 1 of the present invention. この発明の実施の形態１を示すファイル構成図である。It is a file block diagram which shows Embodiment 1 of this invention. この発明の実施の形態１を示すメタデータ構成図である。It is a metadata block diagram which shows Embodiment 1 of this invention. この発明の実施の形態１を示すデータ同期概念図である。It is a data synchronization conceptual diagram which shows Embodiment 1 of this invention. この発明の実施の形態１を示すメタデータから得られる重要度マップ、および再符号化レートマップである。It is the importance map obtained from the metadata which shows Embodiment 1 of this invention, and a re-encoding rate map. この発明の実施の形態２に示す符号化レートマップ修正の一例の説明図である。It is explanatory drawing of an example of the encoding rate map correction shown in Embodiment 2 of this invention. この発明の実施の形態２を示す符号化レートマップ修正の別の一例の説明図である。It is explanatory drawing of another example of the encoding rate map correction which shows Embodiment 2 of this invention. この発明の実施の形態３を示す要約ダビング概念図である。It is a summary dubbing conceptual diagram which shows Embodiment 3 of this invention.

Explanation of symbols

２０ルートディレクトリ、２１マルチメディアディレクトリ、２２メタデータディレクトリ、２３情報管理ファイル、２４マルチメディアデータファイル、２５バックアップファイル、２６メタデータファイル、３０メタデータ、３１ａメタデータ管理情報、３１ｂ−１〜３１ｂ−ＮＶＯＢメタデータ情報サーチポインタ、３１ｃ−１〜３１ｃ−ＮＶＯＢメタデータ情報、３２ａメタデータ一般情報、３２ｂビデオショットマップ情報、３３ａビデオショットマップ一般情報、３３ｂ−１〜３３ｂ−Ｍビデオショットエントリ、３４ａビデオショット開始時間情報、３４ｂビデオショット終了時間情報、３４ｃビデオショット重要度レベル、４０プログラムチェーン情報、４１プログラム、４２セル、４３ビデオオブジェクト情報、４４タイムマップテーブル、４５映像データおよび音声データ、５０重要度マップ縦軸、５１重要度マップ横軸、５２重要度レベルグラフ、５３重要度マップ上限値、５４重要度マップ下限値、５５符号化レートマップ縦軸、５６符号化レートマップ横軸、５７符号化レート設定グラフ、５８符号化レート変換上限値、５９符号化レート変換下限値、６０符号化レート設定グラフ（修正前）、６１平均符号化レート（修正前）、６２符号化レート設定グラフ（修正後）、６３目標平均符号化レート、７０符号化レート設定グラフ（修正前）、７１平均符号化レート（修正前）、７２符号化レート設定グラフ（修正後）、７３目標平均符号化レート、７４符号化レート上限値、８０重要度レベルグラフ、８１閾値、８２重要度レベルグラフ、８３ダビング制御シーケンス図、１００動画像符号化装置、１０１システム制御部、１０２ハードディスクドライブ、１０３光ディスク、１０４記録再生ドライブ、１１０入力映像信号、１１１入力音声信号、１１２ビデオエンコーダ、１１３オーディオエンコーダ、１１４マルチプレクサ、１１５バッファメモリ、１２０メタデータ生成部、１２１メタデータ分析部、１２２記録レートマップ保持部、１２３記録レート変更手段、１３０デマルチプレクサ、１３１ビデオデコーダ、１３２出力映像信号、１３３オーディオデコーダ、１３４出力音声信号、１４０モニタ、１４２映像特徴抽出部、１４３音声特徴抽出部。 20 root directory, 21 multimedia directory, 22 metadata directory, 23 information management file, 24 multimedia data file, 25 backup file, 26 metadata file, 30 metadata, 31a metadata management information, 31b-1 to 31b- N VOB metadata information search pointer, 31c-1 to 31c-N VOB metadata information, 32a metadata general information, 32b video shot map information, 33a video shot map general information, 33b-1 to 33b-M video shot entry, 34a Video shot start time information, 34b Video shot end time information, 34c Video shot importance level, 40 Program chain information, 41 Program , 42 cells, 43 video object information, 44 time map table, 45 video data and audio data, 50 importance map vertical axis, 51 importance map horizontal axis, 52 importance level graph, 53 importance map upper limit, 54 Importance map lower limit value, 55 Coding rate map vertical axis, 56 Coding rate map horizontal axis, 57 Coding rate setting graph, 58 Coding rate conversion upper limit value, 59 Coding rate conversion lower limit value, 60 Coding rate setting Graph (before correction), 61 average encoding rate (before correction), 62 encoding rate setting graph (after correction), 63 target average encoding rate, 70 encoding rate setting graph (before correction), 71 average encoding rate (Before correction), 72 Encoding rate setting graph (After correction), 73 Target Average coding rate, 74 Coding rate upper limit value, 80 Importance level graph, 81 Threshold value, 82 Importance level graph, 83 Dubbing control sequence diagram, 100 Moving picture coding device, 101 System control unit, 102 Hard disk drive, 103 Optical disk, 104 recording / reproducing drive, 110 input video signal, 111 input audio signal, 112 video encoder, 113 audio encoder, 114 multiplexer, 115 buffer memory, 120 metadata generation unit, 121 metadata analysis unit, 122 recording rate map holding unit 123 recording rate changing means, 130 demultiplexer, 131 video decoder, 132 output video signal, 133 audio decoder, 134 output audio signal, 140 monitor, 142 video feature extractor, 143 audio feature extractor.

Claims

Audio feature extraction means for dividing a video signal and an audio signal into predetermined intervals, and extracting audio features of the audio signals in each interval;
Coding rate determining means for determining a coding rate based on the characteristics of the speech in each section;
A moving image encoding apparatus comprising: means for encoding the video signal and the audio signal at a determined encoding rate.

The encoding rate determining means includes
Feature data representing the voice characteristics of each section and feature data generating means for generating the start position of each section;
Feature data analysis means for creating a coding rate map based on the feature data;
The moving image coding apparatus according to claim 1, further comprising coding rate generation means for generating and outputting data indicating a coding rate of each section based on the coding rate map.

3. The moving image encoding apparatus according to claim 2, wherein the feature data generation unit generates and holds feature information indicating presence / absence of viewing value based on a physical change amount in the audio signal.

The feature data analysis means sets a value within a range between a predetermined upper limit value and a lower limit value as an encoding rate when generating an encoding rate map from the feature data. 2. The moving image encoding apparatus according to 2.

Means for recording the encoded video and audio signals on a recording medium;
The encoding rate generation means includes
The data indicating the coding rate is generated not only based on the coding rate map but also based on the free space of the recording medium used for recording the coded video signal and audio signal. The moving image encoding device according to 1.

Video feature extraction means for extracting video features represented by the video signal of each section;
The encoding parameter determining means includes
The moving image encoding apparatus according to claim 1, wherein an encoding rate is determined based on not only the audio feature of each section but also the video feature.

7. The moving picture encoding apparatus according to claim 6, wherein the encoding parameter determination unit extracts the video feature based on a physical change amount of the video signal.

Means for recording the encoded video and audio signals on a recording medium;
The code rate determining means includes
2. The moving picture encoding apparatus according to claim 1, wherein the encoding rate is determined in accordance with a free space of the recording medium used for recording the encoded video signal and audio signal.

Means for recording the encoded video and audio signals on a recording medium;
When the feature amount representing the feature generated by the feature data generation unit has a value equal to or less than a preset threshold value, the recording of the video signal and the audio signal in the section to the recording medium is skipped. The moving picture encoding apparatus according to claim 1.

An audio feature extraction step of dividing the video signal and the audio signal into predetermined intervals, and extracting an audio feature represented by the audio signal of each interval;
An encoding rate determining step for determining an encoding rate based on a feature of speech represented by the speech signal of each section;
A video encoding method comprising: encoding the video signal and the audio signal at a determined encoding rate.

A first recording step of encoding a video signal and an audio signal supplied from the outside and recording them in a recording means;
An audio feature extraction step of dividing the video signal and the audio signal recorded in the recording means into predetermined intervals, and extracting audio features of the audio signal in each interval;
An encoding rate determining step for determining an encoding rate based on a feature of speech represented by the speech signal of each section;
A reproduction step of reading out and decoding the video signal and the audio signal from the recording means;
A recording / reproducing method comprising: a second recording step of encoding the decoded video signal and audio signal at the encoding rate determined in the encoding rate determining step and recording the encoded signal on the recording means.

The recording means includes a first recording medium and a second recording medium;
In the first recording step, recording is performed on the first recording medium,
The recording / reproducing method according to claim 11, wherein recording is performed on the second recording medium in the second recording step.