JP2005323158A

JP2005323158A - Video- and audio-signal encoding device

Info

Publication number: JP2005323158A
Application number: JP2004139667A
Authority: JP
Inventors: Fumitaka Nakayama; 文貴中山
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2004-05-10
Filing date: 2004-05-10
Publication date: 2005-11-17

Abstract

<P>PROBLEM TO BE SOLVED: To provide a video- and audio-signal encoding device capable of generating encoded data having higher picture quality and higher sound quality than data encoded with encoding parameters set by a user or effectively using a recording medium having small recording capacity. <P>SOLUTION: The video- and audio-signal encoding device generates fixed-length packets from an encoded signal generated by encoding a video input signal and an audio input signal by a given encoding system. The video- and audio-signal encoding device finds a stuffing length needed for fixed-length packeting and a predictive code length when encoding is performed by the encoding system by using an encoding parameter group needed for the encoding, and changes at least one encoding parameter of the encoding parameter group for encoding to match a 2nd predictive encoding length different from the 1st predictive encoding length by taking the stuffing length into consideration for the 1st predictive encoding length. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、映像音声信号符号化装置に関する発明であり、例えば映像信号と音声信号をパケット化して多重化する方式に関する発明である。 The present invention relates to a video / audio signal encoding apparatus, for example, an invention related to a method of packetizing and multiplexing a video signal and an audio signal.

映像信号ならびに音声信号を符号化する一般的な方式としてＭＰＥＧが挙げられる。そのうち、放送などで使用されているＭＰＥＧ２では、例えば、下記特許文献１に記載されているように、映像信号符号化回路から出力された映像符号化信号（以下、ビデオエレメンタリーストリームとも呼ぶ）と音声信号符号化回路から出力された音声信号符号化信号（以下、オーディオエレメンタリーストリームとも呼ぶ）をトランスポートストリームと呼ばれるＴＳパケットで多重化している。 MPEG is a common system for encoding video signals and audio signals. Among them, in MPEG2 used in broadcasting or the like, for example, as described in Patent Document 1 below, a video encoded signal (hereinafter also referred to as a video elementary stream) output from a video signal encoding circuit. An audio signal encoded signal (hereinafter also referred to as an audio elementary stream) output from the audio signal encoding circuit is multiplexed with TS packets called a transport stream.

ここで、ＴＳパケットについて説明する。ＴＳパケットは１８８ｂｙｔｅの固定長パケットからなり、映像信号と音声信号を符号化した各エレメンタリーストリームに対して、区切りヘッダを付加したＰＥＳパケットと呼ばれるパケットを作成する（以下、オーディオＰＥＳパケット、ビデオＰＥＳパケットと呼ぶ）。１つのＰＥＳパケットには各エレメンタリーストリームの符号化情報やビット長などが記されている。各ＰＥＳパケットは１８４ｂｙｔｅずつに区切られ４ｂｙｔｅのヘッダを付加したＴＳパケットを生成し、システムパケットなどと多重化されストリームを形成する。 Here, the TS packet will be described. A TS packet is composed of a fixed-length packet of 188 bytes, and a packet called a PES packet with a separator header added to each elementary stream obtained by encoding a video signal and an audio signal (hereinafter referred to as an audio PES packet, a video PES). Called a packet). One PES packet describes the encoding information and bit length of each elementary stream. Each PES packet is divided into 184 bytes, generates a TS packet with a 4-byte header added, and is multiplexed with a system packet to form a stream.

ここでオーディオの多重化を例にとる。デジタルビデオカメラやＤＶＤに記録されているオーディオの符号化方式において代表的なものにＭＰＥＧ１レイヤー２（通称ＭＰ２）がある。 Here, audio multiplexing is taken as an example. MPEG1 layer 2 (commonly referred to as MP2) is a typical encoding method for audio recorded on digital video cameras and DVDs.

このＭＰ２は１１５２サンプルを１フレームの単位として符号化する方式であり、換言すると、１１５２サンプルを符号化の処理単位としてエレメンタリーストリームを形成することと等価である。そのため、ランダムアクセス性やタイムスタンプ管理の扱いやすさなどを考慮に入れると、１フレームを単位としてＰＥＳパケットを生成する手法が適切であることが知られている。 This MP2 is a method of encoding 1152 samples as a unit of one frame, in other words, it is equivalent to forming an elementary stream using 1152 samples as a processing unit of encoding. For this reason, it is known that a technique for generating a PES packet in units of one frame is appropriate in consideration of random accessibility and ease of handling time stamp management.

このように、オーディオ１フレームを単位として符号化したオーディオエレメンタリーストリームに対して符号化情報やビット長、時間情報などを付加したオーディオＰＥＳパケットを生成し、そのオーディオＰＥＳパケットから１８８ｂｙｔｅ固定長であるオーディオＴＳパケットを生成してオーディオＴＳパケットやシステムＴＳパケットと多重化を行っている。
特開２００１−４４９５７号公報 In this way, an audio PES packet with encoding information, bit length, time information, etc. added to an audio elementary stream encoded in units of one audio frame is generated, and the audio PES packet has a fixed length of 188 bytes. Audio TS packets are generated and multiplexed with audio TS packets and system TS packets.
JP 2001-44957 A

しかしながら、上記のように１フレームを単位としたＰＥＳパケットから、ＴＳパケットを生成する手法では以下の問題が生じる。 However, the method for generating TS packets from PES packets in units of one frame as described above causes the following problems.

ＭＰ２では、１フレームの単位が１１５２サンプルであり、その符号化ビットレートとして２５６ｋｂｐｓ、サンプリング周波数として４８ｋＨｚが一般的に利用されている。このとき、１フレームを単位として生成されるオーディオエレメンタリーストリームのビット長は、
１フレームのサンプル数＊（ビットレート／サンプリング周波数）
で表されることから、上記条件時におけるオーディオエレメンタリーストリームは７６８ｂｙｔｅと計算される。 In MP2, the unit of one frame is 1152 samples, and a coding bit rate of 256 kbps and a sampling frequency of 48 kHz are generally used. At this time, the bit length of the audio elementary stream generated in units of one frame is
Number of samples per frame * (bit rate / sampling frequency)
Therefore, the audio elementary stream under the above conditions is calculated as 768 bytes.

このオーディオエレメンタリーストリームに対して約２０ｂｙｔｅ前後のＰＥＳヘッダを付加してオーディオＰＥＳパケットを形成する。このとき、オーディオＰＥＳパケットのパケット長は約７９０ｂｙｔｅとなる。 An audio PES packet is formed by adding a PES header of about 20 bytes to the audio elementary stream. At this time, the packet length of the audio PES packet is about 790 bytes.

このオーディオＰＥＳパケットから１８８ｂｙｔｅのＴＳパケットを生成すると、５つのＴＳパケットが生成される。そのうち、４つのＴＳパケットはオーディオデータのみで構成されるが、最後の１ＴＳパケットには１８８ｂｙｔｅの固定長パケットにするために約１３０ｂｙｔｅのスタッフィングデータでビットを詰める必要がある。このスタッフィングデータはいわば無駄なデータであるため、ビット利用の面から見て効率が悪い。また、磁気テープのように容量の大きい記録媒体であれば、上記分のスタッフィングバイトデータはさほど気にならない程度の大きさであるが、光ディスクやメモリカードなどの容量が小さい記録媒体だと、限られた容量を効率良く利用しているとは言い難い。 When a 188-byte TS packet is generated from the audio PES packet, five TS packets are generated. Of these, four TS packets are composed only of audio data, but it is necessary to pack bits with stuffing data of about 130 bytes in order to make the last one TS packet a fixed-length packet of 188 bytes. This stuffing data is so-called useless data, and is not efficient in terms of bit usage. Also, if the recording medium has a large capacity such as a magnetic tape, the above-mentioned stuffing byte data is of a size that does not matter so much, but if the recording medium has a small capacity such as an optical disk or a memory card, It is hard to say that the capacity that was allocated is being used efficiently.

本発明はかかる問題点に鑑みてなされたものであり、利用者が設定した符号化パラメータで符号化するデータよりも高画質・高音質の符号化データを作成することのできる、あるいは記録容量の少ない記録媒体の有効利用を行うことのできる映像音声信号符号化装置を提供することを課題とするものである。 The present invention has been made in view of such a problem, and can generate encoded data with higher image quality and higher sound quality than data encoded with an encoding parameter set by a user, or has a recording capacity. An object of the present invention is to provide a video / audio signal encoding apparatus capable of effectively using a small number of recording media.

前記課題を解決するため、本発明は、以下の構成を有する。 In order to solve the above problems, the present invention has the following configuration.

（１）映像入力信号ならびに音声入力信号を所定の符号化方式で符号化した符号化信号から固定長パケットを生成する映像音声信号符号化装置において、
前記符号化に要する符号化パラメータ群を用いて、前記所定の符号化方式で符号化した時の第１の予測符号長と前記固定長パケット化に要するスタッフィング長を求め、
前記符号長に前記スタッフィング長を考慮した第２の予測符号長に適合するように前記符号化パラメータ群のうち少なくとも１つの符号化パラメータを変更して符号化することを特徴とする映像音声信号符号化装置。 (1) In a video / audio signal encoding apparatus that generates a fixed-length packet from an encoded signal obtained by encoding a video input signal and an audio input signal by a predetermined encoding method,
Using the encoding parameter group required for the encoding, a first prediction code length when encoded by the predetermined encoding method and a stuffing length required for the fixed-length packetization are obtained,
The video / audio signal code, wherein the code length is encoded by changing at least one encoding parameter of the encoding parameter group so as to conform to a second prediction code length in consideration of the stuffing length. Device.

（２）前記第２の予測符号長は、前記第１の予測符号長に前記スタッフィング長を加算した符号長であることを特徴とする前記（１）記載の映像音声信号符号化装置。 (2) The video / audio signal encoding device according to (1), wherein the second prediction code length is a code length obtained by adding the stuffing length to the first prediction code length.

（３）前記第２の予測符号長は、前記固定長パケットのパケット長から前記スタッフィング長を減算した符号長を前記第１の符号長から減算した符号長であることを特徴とする前記（１）記載の映像音声信号符号化装置。 (3) The second predictive code length is a code length obtained by subtracting a code length obtained by subtracting the stuffing length from a packet length of the fixed-length packet from the first code length. The video / audio signal encoding device according to claim).

（４）前記第２の予測符号長を算出する前記（２）に記載の方式と前記（３）に記載の方式は本記録装置の利用者が任意に切り替え可能であることを特徴とする前記（１）ないし（３）のいずれか１項に記載の映像音声信号符号化装置。 (4) The method described in (2) and the method described in (3) for calculating the second prediction code length can be arbitrarily switched by a user of the recording apparatus. The video / audio signal encoding device according to any one of (1) to (3).

（５）前記スタッフィング長が存在しない場合、ならびに前記スタッフィング長が所定のビット長よりも少ない場合、前記符号化パラメータ群の変更を行わないことを特徴とする前記（１）記載の映像音声信号符号化装置。 (5) The video / audio signal code according to (1), wherein the coding parameter group is not changed when the stuffing length does not exist and when the stuffing length is less than a predetermined bit length. Device.

（６）前記符号化パラメータ群のうち１つは、前記映像入力信号ならびに前記音声入力信号を前記所定の符号化方式で符号化する際のビットレートであることを特徴とする前記（１）記載の映像音声信号符号化装置。 (6) One of the encoding parameter groups is a bit rate for encoding the video input signal and the audio input signal by the predetermined encoding method. Video / audio signal encoding apparatus.

（７）前記符号化パラメータ群のうち１つは、前記音声入力信号を前記所定の符号化方式で符号化する際のサンプリング周波数であることを特徴とする前記（１）記載の映像音声信号符号化装置。 (7) The video / audio signal code according to (1), wherein one of the encoding parameter groups is a sampling frequency when the audio input signal is encoded by the predetermined encoding method. Device.

（８）前記符号化パラメータ群のうち１つは、前記音声入力信号を前記所定の符号化方式で符号化する際の所定のサンプル数であることを特徴とする前記（１）記載の映像音声信号符号化装置。 (8) One of the encoding parameter groups is a predetermined number of samples when the audio input signal is encoded by the predetermined encoding method. Signal encoding device.

（９）前記所定のサンプル数は、前記符号化方式の決定に応じて一意に決定されることを特徴とする前記（１）又は（８）に記載の映像音声信号符号化装置。 (9) The video / audio signal encoding device according to (1) or (8), wherein the predetermined number of samples is uniquely determined according to the determination of the encoding method.

（１０）前記符号化パラメータ群のうち少なくとも１つの符号化パラメータを変更する際に、前記所定のサンプル数は変更不可能であることを特徴とする前記（１）、（６）ないし（９）のいずれか１項に記載の映像音声信号符号化装置。 (10) The predetermined number of samples cannot be changed when changing at least one encoding parameter in the encoding parameter group. (1), (6) to (9) The video / audio signal encoding device according to claim 1.

（１１）前記符号化パラメータ群のうち少なくとも１つの符号化パラメータを変更する際に、変更する条件は本映像音声信号符号化装置の利用者が選択可能であることを特徴とする前記（１）、（６）ないし（９）のいずれか１項に記載の映像音声信号符号化装置。 (11) When changing at least one encoding parameter in the encoding parameter group, a condition for changing can be selected by a user of the video / audio signal encoding device (1) The video / audio signal encoding device according to any one of (6) to (9).

（１２）前記固定長パケットはＭＰＥＧ２トランスポートストリームパケットであることを特徴とする前記（１）記載の映像音声信号符号化装置。 (12) The video / audio signal encoding apparatus according to (1), wherein the fixed-length packet is an MPEG2 transport stream packet.

本発明は、本符号化装置の利用者が与えた符号化パラメータに対して、固定長パケット化に必要なスタッフィングバイト分のデータを映像信号ならびにオーディオ信号のような有効データとして使用できるように符号化パラメータを変更することができるため、本符号化装置の利用者が設定した符号化パラメータで符号化するデータよりも高画質・高音質の符号化データを作成する、あるいは記録容量の少ない記録媒体の有効利用を行うことができる。 The present invention encodes the coding parameters given by the user of the coding apparatus so that stuffing byte data necessary for fixed-length packetization can be used as effective data such as a video signal and an audio signal. Since the encoding parameters can be changed, the encoded data having higher image quality and higher sound quality than the data encoded with the encoding parameters set by the user of the present encoding apparatus is created, or the recording medium having a smaller recording capacity Can be used effectively.

以下、添付の図面に沿って本発明の実施の形態について図面に基づいて説明する。 Embodiments of the present invention will be described below with reference to the accompanying drawings.

第１の実施例を図１、図２、図３、図５に基づいて説明する。図１は第１の実施形態における全体システム構成図であり、図２は本実施形態を説明するフローチャートであり、図３は図２を回路化したものであり、図５は符号化パラメータ変更処理を図化したものである。 A first embodiment will be described with reference to FIGS. 1, 2, 3, and 5. FIG. 1 is an overall system configuration diagram according to the first embodiment, FIG. 2 is a flowchart for explaining the present embodiment, FIG. 3 is a circuit diagram of FIG. 2, and FIG. 5 is an encoding parameter changing process. Is illustrated.

第１の実施例は現行デジタルビデオカメラで採用されているフォーマットであるＭＰＥＧ１レイヤー２（ＭＰ２）でオーディオ信号を符号化してＴＳパケット化する際の好適な実施形態を示す。 The first embodiment shows a preferred embodiment when an audio signal is encoded into a TS packet by MPEG1 layer 2 (MP2) which is a format adopted in the current digital video camera.

図１において、１０１は図示していないＣＣＤやライン入力端子などからの映像入力信号であり、１０２は映像入力信号１０１を圧縮符号化する映像信号符号化回路である。この映像信号符号化回路１０２は、例えばＭＰＥＧ２などの高能率符号化手法などが挙げられるが、映像信号符号化回路の詳細はこの図には示していない。１０３は映像信号符号化回路１０２で符号化された映像符号化信号であり、ビデオエレメンタリーストリームと呼ばれるものである。 In FIG. 1, reference numeral 101 denotes a video input signal from a CCD or a line input terminal (not shown), and 102 denotes a video signal encoding circuit that compresses and encodes the video input signal 101. The video signal encoding circuit 102 may be a high-efficiency encoding method such as MPEG2, but details of the video signal encoding circuit are not shown in this figure. Reference numeral 103 denotes a video encoded signal encoded by the video signal encoding circuit 102, which is called a video elementary stream.

１０４は、図示していないマイクやライン入力端子などからのオーディオ入力信号であり、１０５はオーディオ入力信号１０４を圧縮符号化するオーディオプロセッサである。このプロセッサ１０５は、ＣＰＵ、ＤＳＰなどの種類は問わず、プロセッサ内部の詳細については図示していない。プロセッサ１０５の機能をＨＷ回路で実現も可能である。１０７はオーディオプロセッサ１０５で符号化されたオーディオ符号化信号であり、オーディオエレメンタリーストリームと呼ばれるものである。 Reference numeral 104 denotes an audio input signal from a microphone or a line input terminal (not shown), and reference numeral 105 denotes an audio processor that compresses and encodes the audio input signal 104. Regardless of the type of CPU, DSP, etc., the processor 105 is not shown in detail in the processor. The function of the processor 105 can be realized by an HW circuit. Reference numeral 107 denotes an audio encoded signal encoded by the audio processor 105, which is called an audio elementary stream.

１０６は、オーディオ信号をオーディオプロセッサ１０５で符号化する際の符号化パラメータを変更する回路であり、詳細は後で述べる。 Reference numeral 106 denotes a circuit for changing an encoding parameter when the audio signal is encoded by the audio processor 105, and details will be described later.

１１０は、ビデオエレメンタリーストリーム１０３ならびにオーディオエレメンタリーストリーム１０７各々をＴＳパケット化して多重化する多重化回路であり、１０８は多重化回路１０７で多重化されたＴＳストリームであり、記録媒体１０９に記録される。 110 is a multiplexing circuit that multiplexes each of the video elementary stream 103 and the audio elementary stream 107 into TS packets, and 108 is a TS stream that is multiplexed by the multiplexing circuit 107, and is recorded on the recording medium 109. Is done.

符号化パラメータ変更回路１０６について図２、図３を用いて説明する。オーディオプロセッサ１０５は、本来本記録装置に与えられた符号化パラメータに基づいて符号化する。ここで言う符号化パラメータとはアナログ入力信号を離散データにするためのサンプリング周波数とオーディオ符号化信号の音質を左右するビットレート、オーディオ符号化の単位である１フレームのサンプル数を示し、本記録装置を使用するユーザが設定可能である。ＭＰＥＧ記録を行う現行のデジタルビデオカメラで採用されているＭＰ２フォーマットは、１フレームが１１５２サンプルであり、その符号化ビットレートとして２５６ｋｂｐｓ、サンプリング周波数として４８ｋＨｚが一般的に採用されている。 The encoding parameter changing circuit 106 will be described with reference to FIGS. The audio processor 105 performs encoding based on the encoding parameter originally given to the recording apparatus. The coding parameters here indicate the sampling frequency for converting the analog input signal into discrete data, the bit rate that affects the sound quality of the audio encoded signal, and the number of samples in one frame that is the unit of audio encoding. It can be set by the user who uses the device. In the MP2 format used in the current digital video camera for performing MPEG recording, one frame is 1152 samples, and the coding bit rate is generally 256 kbps and the sampling frequency is 48 kHz.

このとき、１フレームを単位としてオーディオエレメンタリーストリームを生成し、ＴＳパケット化すると既述のようにスタッフィングバイトでパケット詰めを行わなければならないため、図２、図３を用いてビットレート変更回路１０６の動作を説明する。本文中の宣言と図中の添字は一致する。 At this time, when an audio elementary stream is generated in units of one frame and converted into TS packets, packets must be packed with stuffing bytes as described above. Therefore, the bit rate changing circuit 106 is used with reference to FIGS. The operation of will be described. The declaration in the text matches the subscript in the figure.

Ｓ２０１において、本記録装置利用者が符号化フォーマットを本記録装置に対して与えることで符号化パラメータであるオーディオ１フレームのサンプル数３０２（ＦＲＡＭＥ）が決定する。また、ビットレート３０１（ＢＩＴＲＡＴＥ）、ならびにサンプリング周波数３０３（ＳＰＦ）は本記録装置利用者が決定できる形態ならびに本記録装置が初期状態で任意の値に決定される形態のどちらでも良いものとする。 In S201, the recording apparatus user gives an encoding format to the recording apparatus, whereby the number of samples 302 (FRAME) of one audio frame, which is an encoding parameter, is determined. Further, the bit rate 301 (BITRATE) and the sampling frequency 303 (SPF) may be either a form that can be determined by the user of the recording apparatus or a form in which the recording apparatus is determined to an arbitrary value in the initial state.

Ｓ２０２において、１フレームのオーディオ入力信号を符号化した出力であるオーディオエレメンタリーストリームのビット長３０６（ＡＵ＿ＳＩＺＥ）を、ＢＩＴＲＡＴＥ３０１、ＦＲＡＭＥ３０２、ＳＰＦ３０３を入力として乗算器３０５で算出する。なお、ビット長の算出式は、
ＡＵ＿ＳＩＺＥ＝ＦＲＡＭＥ＊（ＢＩＴＲＡＴＥ／ＳＰＦ）
である。このときサンプリング周波数３０３は逆数器３０４で逆数の値に変換されている。 In S202, the bit length 306 (AU_SIZE) of the audio elementary stream, which is an output obtained by encoding the audio input signal of one frame, is calculated by the multiplier 305 with BITRATE 301, FRAME 302, and SPF 303 as inputs. The bit length calculation formula is
AU_SIZE = FRAME * (BITRATE / SPF)
It is. At this time, the sampling frequency 303 is converted to a reciprocal value by the reciprocator 304.

Ｓ２０３において、オーディオＰＥＳパケット長（ＰＥＳ＿ＳＩＺＥ）３０９を、オーディオエレメンタリーストリームのビット長３０６とＰＥＳヘッダのビット長（ＰＥＳＨ）３０７を入力として加算器３０８で算出する。 In S203, the audio PES packet length (PES_SIZE) 309 is calculated by the adder 308 with the bit length 306 of the audio elementary stream and the bit length (PESH) 307 of the PES header as inputs.

Ｓ２０４において、ＴＳパケット化する際のＴＳパケット数（ＰＡＣＫ＿ＮＵＭ）３１２とスタッフィングバイト長（ＳＴＡＦＦ）３１３をオーディオＰＥＳパケット長３０９とＴＳパケット長（ＰＡＣＫ＿ＳＩＺＥ）３１０を入力として演算器３１１で算出する。なお、ＴＳパケット数とスタッフィングバイト長の算出式は、
ＰＡＣＫ＿ＮＵＭ＝（ＰＥＳ＿ＳＩＺＥ／ＰＡＣＫ＿ＳＩＺＥ）＋１
ＳＴＡＦＦ＝ＰＡＣＫ＿ＳＩＺＥ−ＰＥＳ＿ＳＩＺＥ％ＰＡＣＫ＿ＳＩＺＥ
である。ここで％は剰余を示す。 In S204, the arithmetic unit 311 calculates the TS packet number (PACK_NUM) 312 and stuffing byte length (STAFF) 313 when TS packets are converted into the audio PES packet length 309 and the TS packet length (PACK_SIZE) 310 as inputs. The formula for calculating the number of TS packets and the stuffing byte length is as follows:
PACK_NUM = (PES_SIZE / PACK_SIZE) +1
STAFF = PACK_SIZE-PES_SIZE% PACK_SIZE
It is. Here,% indicates the remainder.

Ｓ２０５において、スタッフィングバイト分のビットをオーディオデータとして使用したときのビット長である更新されたオーディオエレメンタリーストリーム長（ＡＵ＿ＮＥＷ）３１５を、スタッフィングバイト長３１３とオーディオエレメンタリーストリーム長３０６を入力として加算器３１４で算出する。 In S205, the updated audio elementary stream length (AU_NEW) 315, which is the bit length when bits for stuffing bytes are used as audio data, is input using the stuffing byte length 313 and the audio elementary stream length 306 as inputs Calculate at 314.

Ｓ２０６において、更新されたオーディオエレメンタリーストリーム長３１５に対する更新後のビットレート（ＮＥＷ＿ＢＩＴＲＡＴＥ）３１８を、更新されたオーディオエレメンタリーストリーム長３１５とサンプリング周波数３０３と１フレームのサンプル数３０２を入力として乗算器３１７で算出する。 In S206, the updated bit rate (NEW_BITRATE) 318 with respect to the updated audio elementary stream length 315 is input to the multiplier 317 with the updated audio elementary stream length 315, the sampling frequency 303, and the number of samples 302 of one frame as inputs. Calculate with

なお、１フレームのサンプル数３０２は逆数器３１６によって逆数の値に変換されている。 Note that the number of samples 302 in one frame is converted into a reciprocal value by the reciprocator 316.

更新されたビットレートはオーディオプロセッサ１０５へ入力されて、更新後のビットレートを用いてオーディオの符号化処理を行う。 The updated bit rate is input to the audio processor 105, and audio encoding processing is performed using the updated bit rate.

以上の処理を図５に示す。図５において、（ａ）が本符号化装置の利用者が設定したパラメータで符号化したときに発生する符号長（ＡＵ＿ＳＩＺＥ）３０６であり、ＡＵ＿ＳＩＺＥ３０６に、パケットのヘッダ長（ＰＥＳＨ）３０７を加算した符号長を固定パケット化すると、図５（ｂ）のようにスタッフィングをするパケットが存在してしまう。これに対して、図５（ｃ）のように、ＡＵ＿ＳＩＺＥ３０６とＰＥＳＨ３０７の和ＮＥＷ＿ＡＵ＿ＳＩＺＥ３１５が固定パケットのパケット長（ＰＡＣＫ＿ＳＩＺＥ）３１０に適合する符号長を設定し直し、この変更した符号長が３１５に基づいて符号化パラメータ変更回路１０６はビットレートを変更してプロセッサ１０５において符号化を行う。 The above processing is shown in FIG. In FIG. 5, (a) is a code length (AU_SIZE) 306 generated when encoding is performed with the parameters set by the user of this encoding apparatus, and a packet header length (PESH) 307 is added to AU_SIZE 306. If the code length is converted into a fixed packet, a stuffing packet exists as shown in FIG. On the other hand, as shown in FIG. 5C, the sum NEW_AU_SIZE 315 of AU_SIZE 306 and PESH 307 resets the code length suitable for the packet length (PACK_SIZE) 310 of the fixed packet, and the changed code length is based on 315. The encoding parameter changing circuit 106 changes the bit rate and performs encoding in the processor 105.

また、上記の処理を以下のように変更することも可能である。 Moreover, it is possible to change the above processing as follows.

上記手法（図５（ｃ））は、スタッフィングバイト分のデータをオーディオデータのように意味のあるデータで使う考えであったが、記録媒体の容量が少ないときは、本来スタッフィングしなければならないパケットを生成しない手法でも良い。つまり、図５（ｄ）のように、本符号化装置の利用者が設定したビットレートではスタッフィングが存在してしまうパケットは切り捨てるように、新しくオーディオエレメンタリーストリーム長を計算して、そのオーディオエレメンタリーストリーム長に応じたビットレートに更新して符号化を行う。 The above method (FIG. 5C) was based on the idea of using stuffing byte data as meaningful data such as audio data. However, when the capacity of the recording medium is small, the packet that must originally be stuffed It is also possible to use a method that does not generate That is, as shown in FIG. 5D, a new audio elementary stream length is calculated so that a packet in which stuffing exists at the bit rate set by the user of this encoding apparatus is discarded, and the audio element is calculated. Coding is performed by updating the bit rate according to the length of the mental stream.

以上の処理を実現することで、本記録装置の利用者自身が設定したビットレートよりも低いビットレートで記録することになるが、その分記録容量を有効に利用することができる。 By realizing the above processing, recording is performed at a bit rate lower than the bit rate set by the user of the recording apparatus, but the recording capacity can be used effectively.

なお、本符号化装置の利用者が設定した符号化パラメータでスタッフィングバイトが存在しない場合や本符号化装置の利用者が上記のような符号化パラメータの更新を望まない場合は、利用者の設定した条件で記録ができるものとする。 If there is no stuffing byte in the encoding parameter set by the user of this encoding device, or if the user of this encoding device does not want to update the encoding parameter as described above, the user setting It is possible to record under the specified conditions.

第２の実施例を図４、図６に基づいて説明する。図４は第２の実施形態における全体システム構成図、図６は目標符号量変更処理を図化したものである。 A second embodiment will be described with reference to FIGS. FIG. 4 is an overall system configuration diagram in the second embodiment, and FIG. 6 is a diagram illustrating target code amount change processing.

第２の実施例は現行デジタルビデオカメラやデジタル放送で採用されているフォーマットであるＭＰＥＧ２で映像信号を符号化する際の符号量制御（量子化ステップ制御）に関する好適な実施形態を示す。 The second embodiment shows a preferred embodiment relating to code amount control (quantization step control) when a video signal is encoded with MPEG2, which is a format employed in current digital video cameras and digital broadcasting.

ＭＰＥＧ２で映像信号を符号化する場合、ＶＢＶバッファと呼ばれる出力バッファがオーバーフローおよびアンダーフローしないようにマクロブロックごとに量子化ステップを制御する必要がある。そのアルゴリズムはＴＭ５（ＴｅｓｔＭｏｄｅｌＥｄｉｔｉｎｇＣｏｍｍｉｔｔｅｅ：“ＴｅｓｔＭｏｄｅｌ５”；ＩＳＯ／ＩＥＣＪＴＣ／ＳＣ２９２／ＷＧ１１／Ｎ０４００）に記述されているので詳細は省略する。量子化ステップ制御回路において、ＶＢＶバッファの空き容量から与えられた目標符号量に対して固定長パケットのパケット長と関連させることができれば、出力ストリームはスタッフィングの少ないストリームを生成することができる。 When encoding a video signal in MPEG2, it is necessary to control the quantization step for each macroblock so that an output buffer called a VBV buffer does not overflow and underflow. Since the algorithm is described in TM5 (Test Model Editing Committee: “Test Model 5”; ISO / IEC JTC / SC292 / WG11 / N0400), the details are omitted. In the quantization step control circuit, if the target code amount given from the free capacity of the VBV buffer can be related to the packet length of the fixed-length packet, a stream with less stuffing can be generated as the output stream.

図４は、簡略化したＭＰＥＧ２符号化回路に、本発明ブロックを付加したものであるため、ＭＰＥＧ符号化に関する詳細な説明は省略する。４０１は本符号化装置への映像入力信号であり、フレームバッファ４０２に蓄積される。ピクチャタイプがＩピクチャのときは、フレームバッファ４０２のデータはブロック毎のデータとしてＤＣＴ回路４０６に送られる。ＤＣＴ回路４０６では、各ブロックについて、縦横２次元の離散コサイン変換処理を行う。これにより、時間域での信号が周波数域での信号に変換される。このＤＣＴ回路４０６からのＤＣＴ係数は量子化器４０７に送られ、所定の量子化ステップ幅で量子化される。その後図示していないジグザグスキャンで並べ替えられた係数は、可変長符号化（ＶＬＣ）器４１０に送られ、ハフマンコーディングが行われる。この結果得られる符号化ストリームは出力バッファ４１１に一旦蓄えられた後、一定のビットレートで記録媒体４１２に記録される。 Since FIG. 4 is a simplified MPEG2 encoding circuit with the present invention block added, a detailed description of MPEG encoding will be omitted. Reference numeral 401 denotes a video input signal to the encoding apparatus and is stored in the frame buffer 402. When the picture type is an I picture, the data in the frame buffer 402 is sent to the DCT circuit 406 as data for each block. The DCT circuit 406 performs vertical and horizontal two-dimensional discrete cosine transform processing for each block. As a result, a signal in the time domain is converted into a signal in the frequency domain. The DCT coefficient from the DCT circuit 406 is sent to the quantizer 407 and quantized with a predetermined quantization step width. Thereafter, the coefficients rearranged by the zigzag scan (not shown) are sent to a variable length coding (VLC) unit 410 for Huffman coding. The encoded stream obtained as a result is temporarily stored in the output buffer 411 and then recorded on the recording medium 412 at a constant bit rate.

なお、ピクチャタイプがＰピクチャならびにＢピクチャの場合は、フレーム間符号化を行うため、量子化器４０７からの符号化信号を逆量子化器４０９ならびに逆ＤＣＴ回路４０８で一旦デコード（ローカルデコード）して、動き検出回路４０３ならびに動き補償回路４０４で入力映像信号との動き検出、動き補償を行い、減算器４０５で入力映像信号との差分値を算出して、その差分値をＤＣＴ変換する。後の処理手順はＩピクチャと同様である。 When the picture type is a P picture or B picture, inter-frame coding is performed, and thus the encoded signal from the quantizer 407 is once decoded (local decoded) by the inverse quantizer 409 and the inverse DCT circuit 408. Then, the motion detection circuit 403 and the motion compensation circuit 404 perform motion detection and motion compensation with the input video signal, the subtractor 405 calculates a difference value with the input video signal, and DCT converts the difference value. The subsequent processing procedure is the same as that for the I picture.

ここで、出力バッファ４１１がオーバーフローならびにアンダーフローを起こさないようにするために、ＶＬＣ回路４１０からは、現在符号化を行っているピクチャの符号量４１３を目標符号量設定回路４１４に送信して次に符号化する同じタイプのピクチャの目標符号量を決定する。 Here, in order to prevent the output buffer 411 from overflowing or underflowing, the VLC circuit 410 transmits the code amount 413 of the picture that is currently encoded to the target code amount setting circuit 414 and next. The target code amount of the same type of picture to be encoded is determined.

目標符号量設定回路４１４において、次に符号化するピクチャの目標符号量４１５が算出されるが、この目標符号量４１５は目標符号量変更回路４１６に送出され、固定長パケットのパケット長ならびにパケットのヘッダ長４１７を考慮して目標符号量を変更する。この目標符号量変更回路４１６は後で詳細を述べる。変更された目標符号量４１８は量子化ステップ制御回路４１９において量子化ステップを決定してＤＣＴ回路４０６で変換されたＤＣＴ係数に対して量子化器４０７で量子化を行う。 The target code amount setting circuit 414 calculates the target code amount 415 of the next picture to be encoded. This target code amount 415 is sent to the target code amount changing circuit 416, and the packet length of the fixed-length packet and the packet length The target code amount is changed in consideration of the header length 417. The target code amount changing circuit 416 will be described later in detail. The changed target code amount 418 is quantized by the quantizer 407 on the DCT coefficient converted by the DCT circuit 406 by the quantization step control circuit 419 determining the quantization step.

ここで、目標符号量変更回路４１６について図６を用いて説明する。図６において、（ａ）が目標符号量設定回路４１４から算出された目標符号量４１５とパケットのヘッダ長４１７である。この目標符号量４１５にパケットのヘッダ長４１７を加算した符号長を固定パケット化すると、図６（ｂ）のようにスタッフィングをするパケットが存在してしまう。これに対して、目標符号量変更回路４１６は、図６（ｃ）または図６（ｄ）のように、目標符号量４１５とヘッダ長４１７の和が固定パケットのパケット長４２０の整数倍になるように目標符号量を設定し直す機能を持つ。この変更した目標符号量が４１８であり、その値に基づいて量子化ステップ制御回路４１９は量子化ステップを決定していく。 Here, the target code amount changing circuit 416 will be described with reference to FIG. 6A shows the target code amount 415 calculated from the target code amount setting circuit 414 and the header length 417 of the packet. If the code length obtained by adding the packet header length 417 to the target code amount 415 is converted into a fixed packet, there will be a stuffed packet as shown in FIG. On the other hand, the target code amount changing circuit 416 is such that the sum of the target code amount 415 and the header length 417 is an integral multiple of the packet length 420 of the fixed packet, as shown in FIG. In this way, the target code amount is reset. The changed target code amount is 418, and the quantization step control circuit 419 determines the quantization step based on the value.

このとき、変更する目標符号量はバッファのオーバーフローまたはアンダーフローに留意する必要がある。つまり、目標符号量設定回路４１４が、１パケット分の増減がある場合でもバッファが破綻しないような目標符号量を計算しなければならない。 At this time, it is necessary to pay attention to the overflow or underflow of the target code amount to be changed. That is, the target code amount setting circuit 414 must calculate a target code amount that does not cause the buffer to fail even when there is an increase or decrease of one packet.

以上の処理を行うことで、目標符号量に応じて符号化する際に、スタッフィングバイトをなるべく少なくするように制御を行うことが可能となる。 By performing the above processing, it is possible to perform control so as to reduce the stuffing bytes as much as possible when encoding according to the target code amount.

第１の実施形態における符号化装置の構成を示すブロック図The block diagram which shows the structure of the encoding apparatus in 1st Embodiment. 第１の実施形態における符号化パラメータ変更回路の処理を示すフローチャートThe flowchart which shows the process of the encoding parameter change circuit in 1st Embodiment. 第１の実施形態における符号化パラメータ変更回路の回路図Circuit diagram of coding parameter changing circuit in the first embodiment 第２の実施形態における符号化装置の構成を示すブロック図The block diagram which shows the structure of the encoding apparatus in 2nd Embodiment. 符号量制御に伴う音声データの様子を示す図The figure which shows the mode of the voice data with code amount control 符号量制御に伴う音声データの様子を示す図The figure which shows the mode of the voice data with code amount control

Explanation of symbols

１０１映像入力信号
１０２映像信号符号化回路
１０３ビデオエレメンタリーストリーム
１０４オーディオ入力信号
１０５オーディオプロセッサ
１０６符号化パラメータ変更回路
１０７オーディオエレメンタリーストリーム
１０８ＴＳストリーム
１０９記録媒体
１１０多重化回路
３０１ビットレート
３０２オーディオ１フレームのサンプル数
３０３サンプリング周波数
３０４逆数器
３０５乗算器
３０６オーディオエレメンタリーストリームのビット長
３０７ＰＥＳヘッダのビット長
３０８加算器
３０９オーディオＰＥＳパケット長
３１０ＴＳパケット長
３１１演算器
３１２ＴＳパケット数
３１３スタッフィングバイト長
３１４加算器
３１５更新されたオーディオエレメンタリーストリーム長
３１６逆数器
３１７乗算器
３１８更新後のビットレート
４０１映像入力信号
４０２フレームバッファ
４０３動き検出回路
４０４動き補償回路
４０５減算器
４０６ＤＣＴ回路
４０７量子化器
４０８逆ＤＣＴ回路
４０９逆量子化器
４１０可変長符号化器
４１１出力バッファ
４１２記録媒体
４１３ピクチャの符号量
４１４目標符号量設定回路
４１５ピクチャの目標符号量
４１６目標符号量変更回路
４１７パケットのヘッダ長
４１８変更された目標符号量
４１９量子化ステップ制御回路
４２０固定パケットのヘッダ長 DESCRIPTION OF SYMBOLS 101 Video input signal 102 Video signal encoding circuit 103 Video elementary stream 104 Audio input signal 105 Audio processor 106 Coding parameter change circuit 107 Audio elementary stream 108 TS stream 109 Recording medium 110 Multiplexing circuit 301 Bit rate 302 Audio 1 frame Sample number 303 Sampling frequency 304 Inverse number 305 Multiplier 306 Audio elementary stream bit length 307 PES header bit length 308 Adder 309 Audio PES packet length 310 TS packet length 311 Calculator 312 TS packet number 313 Stuffing byte length 314 Adder 315 Updated audio elementary stream length 316 Inverse number 317 Multiplier 318 Bit rate 401 after update 401 Video input signal 402 Frame buffer 403 Motion detection circuit 404 Motion compensation circuit 405 Subtractor 406 DCT circuit 407 Quantizer 408 Inverse DCT circuit 409 Inverse quantizer 410 Variable length encoder 411 Output buffer 412 Recording Medium 413 Picture code amount 414 Target code amount setting circuit 415 Picture target code amount 416 Target code amount change circuit 417 Packet header length 418 Changed target code amount 419 Quantization step control circuit 420 Fixed packet header length

Claims

In a video / audio signal encoding device that generates a fixed-length packet from an encoded signal obtained by encoding a video input signal and an audio input signal by a predetermined encoding method,
Using the encoding parameter group required for the encoding, a stuffing length required for the fixed-length packetization and a first prediction code length when encoded by the encoding method are obtained,
At least one coding parameter of the coding parameter group is adapted so that the first prediction code length is adapted to a second prediction code length different from the first prediction coding length considering the stuffing length. A video / audio signal encoding apparatus, wherein the encoding is performed by changing.

The video / audio signal encoding apparatus according to claim 1, wherein the second prediction code length is a code length obtained by adding the stuffing length to the first prediction code length.

2. The code length according to claim 1, wherein the second predicted code length is a code length obtained by subtracting a code length obtained by subtracting the stuffing length from a packet length of the fixed-length packet from the first code length. Video / audio encoding device.

The method according to claim 2 and the method according to claim 3 for calculating the second prediction code length can be arbitrarily switched by a user of the recording apparatus. 4. The video / audio signal encoding device according to any one of 3 above.

The video / audio signal encoding apparatus according to claim 1, wherein the encoding parameter group is not changed when the stuffing length does not exist and when the stuffing length is smaller than a predetermined bit length.

The video / audio according to claim 1, wherein one of the encoding parameter groups is a bit rate for encoding the video input signal and the audio input signal by the predetermined encoding method. Signal encoding device.

The video / audio encoding apparatus according to claim 1, wherein one of the encoding parameter groups is a sampling frequency when the audio input signal is encoded by the predetermined encoding method.

The video / audio encoding apparatus according to claim 1, wherein one of the encoding parameter groups is a predetermined number of samples when the audio input signal is encoded by the predetermined encoding method. .

The video / audio signal encoding device according to claim 1 or 8, wherein the predetermined number of samples is uniquely determined according to the determination of the encoding method.

The predetermined number of samples cannot be changed when changing at least one encoding parameter in the encoding parameter group. 10. The video / audio signal encoding device according to Item.

The change condition can be selected by a user of the video / audio signal encoding apparatus when changing at least one encoding parameter in the encoding parameter group. The video / audio signal encoding device according to claim 9.

The video / audio signal encoding apparatus according to claim 1, wherein the fixed-length packet is an MPEG2 transport stream packet.