JP2008197199A

JP2008197199A - Audio encoder and audio decoder

Info

Publication number: JP2008197199A
Application number: JP2007030062A
Authority: JP
Inventors: Takatoshi Nishio; 卓敏西尾
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 2007-02-09
Filing date: 2007-02-09
Publication date: 2008-08-28

Abstract

<P>PROBLEM TO BE SOLVED: To provide an audio encoder and an audio decoder in which, when audio is reproduced by the audio decoder, a compressed audio data itself etc. is processed, and the audio data after decoding is processed and output. <P>SOLUTION: The audio encoder includes: an encoding means 101 for outputting an encoded data by encoding the input audio data, and outputting an encoding associated data about encoding; a sound volume level output means 102 for outputting a sound volume level of the audio data by performing predetermined measurement on the audio data in a period which is determined beforehand; and an auxiliary data output means 103 for creating and outputting an auxiliary data from the sound volume level and the encoding associated data which is output from the encoding means 101. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、デジタル信号処理を行い、符号化を行うオーディオ符号化装置と符号化された符号化データを再生するオーディオ復号化装置に関するものである。 The present invention relates to an audio encoding device that performs digital signal processing and performs encoding, and an audio decoding device that reproduces encoded data.

従来のオーディオ符号化装置及びその復号化装置には、音楽情報でない部分にアクセスして音楽出力しているものがある（例えば、特許文献１参照）。 Some conventional audio encoding devices and decoding devices output music by accessing portions that are not music information (see, for example, Patent Document 1).

図１３はＣＤ上のフォーマットを示す。また図１４は従来のオーディオ復号化装置のフローチャートを示す。
以下図１３を補足として用いて、図１４のフローチャートで従来のオーディオ復号化装置の動作を説明する。 FIG. 13 shows the format on the CD. FIG. 14 shows a flowchart of a conventional audio decoding apparatus.
The operation of the conventional audio decoding apparatus will be described below with reference to the flowchart of FIG. 14, using FIG. 13 as a supplement.

図１４において、オーディオ復号化装置は、リードＩＮ領域（図１３の１３００）のデータを読み込み、ディスクのＴＯＣ情報を読み込む（ステップＳＴ１）。読み込んだＴＯＣ情報より、各トラックのコントロールコードを抽出し（ステップＳＴ２）、ＣＤがＣＤ−ＤＡのオーディオディスクであるか、またはＣＤ−ＲＯＭディスクであるかを判定する（ステップＳＴ３）。ここで、ＣＤが、ＣＤ−ＤＡのオーディオディスクの場合は、オーディオ復号化装置は、ＴＯＣ情報を基に、ＣＤオーディオ再生処理（ステップＳＴ１２）を行って、通常のＣＤプレーヤとしての音楽再生を行う。 In FIG. 14, the audio decoding apparatus reads the data in the lead-in area (1300 in FIG. 13) and reads the TOC information of the disc (step ST1). The control code of each track is extracted from the read TOC information (step ST2), and it is determined whether the CD is a CD-DA audio disk or a CD-ROM disk (step ST3). Here, if the CD is a CD-DA audio disc, the audio decoding device performs CD audio playback processing (step ST12) based on the TOC information, and performs music playback as a normal CD player. .

一方、ＣＤ−ＲＯＭディスクと判定した場合は、オーディオ復号化装置は、ＣＤ−ＲＯＭパステーブル（図１３の１３０１）の内容を読み込んで、ＣＤ−ＲＯＭとしてのファイル構造を読み込む（ステップＳＴ４）。ここで、オーディオ復号化装置は、読み込んだファイル構造を確認し、ファイルの拡張子等から、圧縮オーディオファイルが存在するか否かを判断し（ステップＳＴ５）、圧縮オーディオファイルが存在しない場合は処理を終了する。圧縮オーディオファイルが存在する場合は、続いて、オーディオ復号化装置は、同様にファイルの拡張子等から、情報ファイル（図１３の１３０３）が存在するか否かを確認する（ステップＳＴ６）。ここで、情報ファイルが存在しない場合は、オーディオ復号化装置は、パステーブル（図１３の１３０１）を参照し、ＣＤ−ＲＯＭファイルシステムの内容に基づいて、圧縮オーディオファイルの抽出を行い（ステップＳＴ１０）、圧縮オーディオ再生プログラムを起動し（ステップＳＴ１１）、圧縮オーディオの再生を行う。 On the other hand, if it is determined that the disc is a CD-ROM disc, the audio decoding device reads the contents of the CD-ROM path table (1301 in FIG. 13) and reads the file structure as a CD-ROM (step ST4). Here, the audio decoding device confirms the read file structure, determines whether or not the compressed audio file exists from the file extension or the like (step ST5), and performs processing when the compressed audio file does not exist. Exit. If there is a compressed audio file, the audio decoding apparatus subsequently checks whether or not an information file (1303 in FIG. 13) exists from the file extension and the like (step ST6). Here, if the information file does not exist, the audio decoding device refers to the path table (1301 in FIG. 13) and extracts the compressed audio file based on the contents of the CD-ROM file system (step ST10). ), A compressed audio reproduction program is started (step ST11), and the compressed audio is reproduced.

一方、ステップＳＴ６において情報ファイルが存在した場合は、オーディオ復号化装置は、その内容を読み込み（ステップＳＴ７）、楽曲再生リストを作成する（ステップＳＴ８）。併せて、オーディオ復号化装置は、再生リストからそれぞれのファイル名、再生開始アドレスを抽出しておく（ステップＳＴ９）。 On the other hand, if an information file exists in step ST6, the audio decoding device reads the content (step ST7) and creates a music reproduction list (step ST8). In addition, the audio decoding apparatus extracts each file name and reproduction start address from the reproduction list (step ST9).

なお、オーディオ符号化装置は、上記オーディオ復号化装置で再生できるように図１３に示すフォーマットで符号化することになる。
特開２００６−１１４１４８号公報 Note that the audio encoding device performs encoding in the format shown in FIG. 13 so that it can be reproduced by the audio decoding device.
JP 2006-114148 A

このような、従来のオーディオ復号化装置でオーディオデータを再生する際、オーディオ符号化装置が出力する、圧縮オーディオデータ以外のデータを用いて、該オーディオ復号化装置は上述のように楽曲再生リストを作成表示し、使用者がこれを見てタイトル名とオーディオデータ等とを関連付けながら再生操作を行うことができる。 When audio data is reproduced by such a conventional audio decoding device, the audio decoding device uses the data other than the compressed audio data output from the audio encoding device, and the audio decoding device displays the music reproduction list as described above. The user can create and display it and perform a reproduction operation while associating the title name with the audio data or the like by seeing this.

しかしながら従来のオーディオ符号化装置が出力する圧縮オーディオデータ以外のデータは、圧縮オーディオデータとリンクされた再生開始アドレスやタイトル名といった情報であって、オーディオデータそのものを加工するために用いる情報ではないため、オーディオ復号化装置で圧縮オーディオデータそのもの等の加工を行い、元のオーディオデータとは異なるオーディオデータを、即ち元々のオーディオデータとは異なる音量のオーディオ等を出力することができないという問題があった。 However, data other than the compressed audio data output by the conventional audio encoding device is information such as a reproduction start address and a title name linked to the compressed audio data, and is not information used for processing the audio data itself. The audio decoding device processes the compressed audio data itself and the like, and there is a problem that audio data different from the original audio data cannot be output, that is, audio having a volume different from that of the original audio data cannot be output. .

本発明は、上記の問題点を解消するためになされたもので、オーディオ復号化装置でオーディオを再生する際に、圧縮オーディオデータそのもの等の加工を行い、元のオーディオデータとは異なるオーディオデータを出力することを可能とできるオーディオ符号化装置、及びオーディオ復号化装置を提供することを目的とする。 The present invention has been made to solve the above-described problems. When audio is played back by an audio decoding device, the compressed audio data itself is processed, and audio data different from the original audio data is processed. It is an object of the present invention to provide an audio encoding device and an audio decoding device that can be output.

上記の課題を解決するために、本発明の請求項１に係るオーディオ符号化装置は、オーディオデータが入力され、該オーディオデータを符号化した符号化データを、前記オーディオデータに関する補助データとともに出力するオーディオ符号化装置において、入力されるオーディオデータを符号化処理して前記符号化データを出力すると共に、前記オーディオデータの符号化に関する符号化関連データを出力する符号化手段と、前記オーディオデータの予め決められた期間毎の音量レベルを求めて出力する音量レベル出力手段と、前記音量レベル出力手段が出力する前記音量レベルと、前記符号化手段が出力する前記符号化関連データとから、前記音量レベルを含む補助データを作成する補助データ出力手段とを備えた、ことを特徴とするものである。 In order to solve the above problem, an audio encoding device according to claim 1 of the present invention receives audio data and outputs encoded data obtained by encoding the audio data together with auxiliary data related to the audio data. In the audio encoding device, the input audio data is encoded to output the encoded data, and encoding means for outputting encoding-related data related to encoding of the audio data; From the volume level output means for obtaining and outputting the volume level for each determined period, the volume level output from the volume level output means, and the encoding related data output from the encoding means, the volume level Auxiliary data output means for creating auxiliary data including It is.

これにより、オーディオデータの音量レベルの情報を含む補助データをオーディオデータの符号化データとともに出力することができ、音量レベルの情報を用いたオーディオデータの加工を可能とできる。 As a result, auxiliary data including volume level information of audio data can be output together with encoded data of audio data, and audio data can be processed using volume level information.

また、本発明の請求項２に係るオーディオ符号化装置は、請求項１記載のオーディオ符号化装置において、前記予め決められた期間が、楽曲１曲分である、ことを特徴とするものである。 An audio encoding apparatus according to claim 2 of the present invention is the audio encoding apparatus according to claim 1, wherein the predetermined period is one piece of music. .

これにより、オーディオデータの音量レベルの情報を含む補助データをオーディオデータの符号化データとともに出力することができ、音量レベルの情報を用いたオーディオデータの加工を可能とでき、特に元のオーディオデータにおいて曲間で音量レベルの差があるときに、これを一定の音量レベルとなるような調整を可能とできる。 As a result, the auxiliary data including the volume level information of the audio data can be output together with the encoded data of the audio data, and the audio data can be processed using the volume level information, particularly in the original audio data. When there is a difference in volume level between songs, it is possible to adjust the volume level to a certain level.

また、本発明の請求項３に係るオーディオ符号化装置は、請求項１記載のオーディオ符号化装置において、前記音量レベル出力手段が、前記音量レベルを、前記予め決められた期間のオーディオデータの音量の最大値と、平均の音量とに基づいて求める、ことを特徴とするものである。 An audio encoding apparatus according to claim 3 of the present invention is the audio encoding apparatus according to claim 1, wherein the volume level output means sets the volume level to the volume of audio data in the predetermined period. It is calculated | required based on the maximum value of and the average sound volume.

これにより、精度よく検出された音量レベルの情報を含む補助データをオーディオデータの符号化データとともに出力することができる。 As a result, auxiliary data including volume level information detected accurately can be output together with encoded data of audio data.

また、本発明の請求項４に係るオーディオ復号化装置は、請求項１記載のオーディオ符号化装置より出力される符号化データを、復号出力するオーディオ復号化装置であって、前記オーディオ符号化装置より出力される前記符号化データを復号してオーディオデータを出力する復号化手段と、前記オーディオ符号化装置より出力される前記補助データから前記音量レベルを抽出する音量レベル抽出手段と、前記符号化手段が出力する前記オーディオデータの音量を、前記音量レベル抽出手段により抽出した前記音量レベルに基づき、調整し、出力する音量調整手段とを備えた、ことを特徴とするものである。 An audio decoding apparatus according to claim 4 of the present invention is an audio decoding apparatus that decodes and outputs encoded data output from the audio encoding apparatus according to claim 1, wherein the audio encoding apparatus Decoding means for decoding the encoded data output from the audio data and outputting audio data; volume level extracting means for extracting the volume level from the auxiliary data output from the audio encoding device; and the encoding Volume adjustment means for adjusting and outputting the volume of the audio data output by the means based on the volume level extracted by the volume level extraction means.

これにより、復号したオーディオデータの振幅を、オーディオ符号化装置が出力する補助データに含まれる音量レベルの情報を用いて調整することができる。 Thereby, the amplitude of the decoded audio data can be adjusted by using the volume level information included in the auxiliary data output from the audio encoding device.

また、本発明の請求項５に係るオーディオ符号化装置は、オーディオデータが入力され、該オーディオデータを符号化した符号化データを、前記オーディオデータに関する補助データとともに出力するオーディオ符号化装置において、入力されるオーディオデータを符号化の単位であるフレーム毎に符号化処理して前記符号化データを出力すると共に、前記オーディオデータの符号化に関する符号化関連データを出力する符号化手段と、前記オーディオデータを前記フレーム毎に、複数のカテゴリのいずれかに分類し、該フレームがいずれのカテゴリに分類されたかを示すカテゴリ情報を出力するカテゴリ分類手段と、前記カテゴリ分類手段が出力する前記カテゴリ情報と、前記符号化手段が出力する前記符号化関連データとから、前記カテゴリ情報を含む補助データを作成する補助データ出力手段とを備えた、ことを特徴とするものである。 An audio encoding device according to claim 5 of the present invention is an audio encoding device that receives audio data and outputs encoded data obtained by encoding the audio data together with auxiliary data related to the audio data. Encoding means for encoding the audio data for each frame, which is a unit of encoding, and outputting the encoded data; and encoding means for outputting encoding-related data relating to encoding of the audio data; and the audio data For each of the frames, category classification means for outputting category information indicating to which category the frame is classified, and the category information output by the category classification means, From the encoding related data output by the encoding means, the category And an auxiliary data output means for creating an auxiliary data including management information, it is characterized in.

これにより、オーディオデータのカテゴリ情報を含む補助データをオーディオデータの符号化データとともに出力することができ、カテゴリ情報を用いたオーディオデータの加工を可能とできる。 Thereby, auxiliary data including category information of audio data can be output together with encoded data of audio data, and audio data can be processed using category information.

また、本発明の請求項６に係るオーディオ符号化装置は、請求項５記載のオーディオ符号化装置において、前記補助データが、カテゴリを示す値と、該カテゴリに分類されたフレームが連続する数との対の情報を含むものである、ことを特徴とするものである。 An audio encoding apparatus according to claim 6 of the present invention is the audio encoding apparatus according to claim 5, wherein the auxiliary data includes a value indicating a category and a number of consecutive frames classified into the category. It is characterized by including the information of the pair.

これにより、補助データに埋め込むデータの量を削減することができる。 As a result, the amount of data embedded in the auxiliary data can be reduced.

また、本発明の請求項７に係るオーディオ復号化装置は、請求項５記載のオーディオ符号化装置より出力される符号化データを、復号出力するオーディオ復号化装置であって、前記オーディオ符号化装置より出力される前記符号化データを復号してオーディオデータを出力する復号化手段と、前記オーディオ符号化装置より出力される前記補助データから前記カテゴリ情報を抽出するカテゴリ抽出手段と、前記複数のカテゴリのうちオーディオ出力を制御しようとするカテゴリを指定するカテゴリ指定手段と、前記補助データより抽出した前記カテゴリ情報と、前記カテゴリ指定手段より出力される指定カテゴリとに基づき、前記復号化されたオーディオデータを、そのうちの前記指定カテゴリに属するフレームのオーディオ出力については出力制御を行ない、出力するオーディオ出力手段とを備えた、ことを特徴とするものである。 An audio decoding apparatus according to claim 7 of the present invention is an audio decoding apparatus for decoding and outputting encoded data output from the audio encoding apparatus according to claim 5, wherein the audio encoding apparatus Decoding means for decoding the encoded data output from the audio data and outputting audio data; category extracting means for extracting the category information from the auxiliary data output from the audio encoding device; and the plurality of categories Among the audio data decoded based on the category specifying means for specifying a category to be controlled for audio output, the category information extracted from the auxiliary data, and the specified category output from the category specifying means. For audio output of frames belonging to the specified category It performs force control, and a audio output means for outputting, it is characterized in.

これにより、カテゴリ情報を用いて、所定のカテゴリに属する音声のオーディオ出力を制御することができる。 Thereby, the audio output of the sound belonging to the predetermined category can be controlled using the category information.

また、本発明の請求項８に係るオーディオ復号化装置は、請求項７記載のオーディオ復号化装置において、前記カテゴリ指定手段は、前記複数のカテゴリのうちオーディオ出力しないカテゴリを指定するものであり、前記オーディオ出力手段は、前記復号化手段が出力するオーディオデータのうち、前記カテゴリ指定手段により指定されるカテゴリに属するフレームをオーディオ出力しないよう制御する、ことを特徴とするものである。 An audio decoding device according to claim 8 of the present invention is the audio decoding device according to claim 7, wherein the category specifying means specifies a category in which the audio is not output among the plurality of categories. The audio output means controls the audio data output from the decoding means so that frames belonging to the category specified by the category specifying means are not output as audio.

これにより、カテゴリ情報を用いて、所定のカテゴリに属する音声のオーディオ出力をしないように制御することができる。 Thereby, it is possible to control not to output audio belonging to a predetermined category using the category information.

また、本発明の請求項９に係るオーディオ復号化装置は、請求項５記載のオーディオ符号化装置より出力される符号化データを、復号出力するオーディオ復号化装置であって、前記オーディオ符号化装置より出力される前記補助データから前記カテゴリ情報を抽出するカテゴリ抽出手段と、前記複数のカテゴリのうちオーディオ出力しないカテゴリを指定する出力禁止カテゴリ指定手段と、前記補助データより抽出した前記カテゴリ情報と、前記出力禁止カテゴリ指定手段より出力される出力禁止カテゴリとに基づき、前記オーディオ符号化装置より出力される前記符号化データのうち、前記出力禁止カテゴリに属するフレームの符号化データを除く符号化データを復号してオーディオデータを出力する復号化手段とを備えた、ことを特徴とするものである。 An audio decoding apparatus according to claim 9 of the present invention is an audio decoding apparatus for decoding and outputting encoded data output from the audio encoding apparatus according to claim 5, wherein the audio encoding apparatus Category extraction means for extracting the category information from the auxiliary data output from, output prohibition category specification means for specifying a category that does not output audio among the plurality of categories, the category information extracted from the auxiliary data, Based on the output prohibition category output from the output prohibition category specifying means, encoded data excluding encoded data of frames belonging to the output prohibition category among the encoded data output from the audio encoding device. And a decoding means for decoding and outputting audio data. Is shall.

また、本発明の請求項１０に係るオーディオ復号化装置は、請求項９記載のオーディオ復号化装置において、前記復号化手段の後段に設けられ、該復号化手段が復号したオーディオデータの出力を、前記出力禁止カテゴリ指定手段の出力に応じて、前記復号化手段が復号をしていない符号化データのままのデータが出力される期間が一定となるよう制御するオーディオ出力手段を、さらに備えた、ことを特徴とするものである。 An audio decoding device according to claim 10 of the present invention is the audio decoding device according to claim 9, which is provided at a subsequent stage of the decoding means, and outputs audio data decoded by the decoding means. According to the output of the output prohibition category designating means, further comprising an audio output means for controlling the period during which the encoded data that is not decoded by the decoding means is output to be constant, It is characterized by this.

これにより、カテゴリ情報を用いて、所定のカテゴリに属する音声のオーディオ出力をしないように制御することができ、また、復号化しないフレームの数によらず、一定の間隔でオーディオ出力を行うことができる。 Thus, it is possible to perform control so that audio belonging to a predetermined category is not output using the category information, and audio output can be performed at regular intervals regardless of the number of frames that are not decoded. it can.

また、本発明の請求項１１に係るオーディオ符号化装置は、複数のオーディオ音源からのオーディオデータが入力され、該オーディオデータを符号化した符号化データを、前記オーディオデータに関する補助データとともに出力するオーディオ符号化装置において、前記複数のオーディオ音源のうちの１つ以上のオーディオ音源について該オーディオ音源からのオーディオデータが符号化の際の複数のオーディオ帯域のうちのどのオーディオ帯域に盛り込まれたかを示す帯域データ、及び符号化に関する符号化関連データを、前記複数のオーディオ音源からのオーディオデータを符号化した符号化データとともに、出力する符号化手段と、前記符号化手段が出力する前記帯域データと、前記符号化関連データとから、前記帯域データを含む補助データを作成する補助データ出力手段とを備えた、ことを特徴とするものである。 An audio encoding apparatus according to claim 11 of the present invention receives audio data from a plurality of audio sound sources, and outputs encoded data obtained by encoding the audio data together with auxiliary data related to the audio data. In the encoding device, for one or more audio sound sources of the plurality of audio sound sources, a band indicating in which audio band the audio data from the audio sound source is included in the plurality of audio bands at the time of encoding Data, and encoding related data related to encoding, together with encoded data obtained by encoding audio data from the plurality of audio sound sources, encoding means for outputting, the band data output by the encoding means, The encoding-related data and the supplement containing the band data. And an auxiliary data output means for creating data, it is characterized in.

これにより、オーディオデータの帯域データの情報を含む補助データをオーディオデータの符号化データとともに出力することができ、帯域データの情報を用いたオーディオデータの加工を可能とできる。 As a result, auxiliary data including information on the band data of the audio data can be output together with the encoded data of the audio data, and the audio data can be processed using the information on the band data.

また、本発明の請求項１２に係るオーディオ符号化装置は、請求項１１記載のオーディオ符号化装置において、前記符号化手段が、所定のオーディオ音源のオーディオデータについて、前記符号化データ中の所定のオーディオ帯域に収まるように制限をかけた上で符号化して、前記符号化データを出力する、ことを特徴とするものである。 An audio encoding device according to claim 12 of the present invention is the audio encoding device according to claim 11, wherein the encoding means performs predetermined data in the encoded data for audio data of a predetermined audio source. The encoded data is output after being encoded so as to be within the audio band.

これにより、所定のオーディオ音源のオーディオデータをより狭い帯域に盛り込むことができる。 Thereby, audio data of a predetermined audio source can be included in a narrower band.

また、本発明の請求項１３に係るオーディオ復号化装置は、請求項１１記載のオーディオ符号化装置より出力される符号化データを、復号出力するオーディオ復号化装置であって、前記オーディオ符号化装置より出力される前記補助データから前記帯域データを抽出する帯域データ抽出手段と、前記１つ以上のオーディオ音源のうちオーディオ出力を制御しようとするオーディオ音源を指定する音源指定手段と、前記補助データより抽出した前記帯域データと、前記音源指定手段より出力される指定音源とに基づき、前記指定音源のオーディオデータが盛り込まれた帯域の符号化データを、そのオーディオデータの振幅を制限して復号して出力する復号化手段とを備えた、ことを特徴とするものである。 An audio decoding apparatus according to claim 13 of the present invention is an audio decoding apparatus for decoding and outputting encoded data output from the audio encoding apparatus according to claim 11, wherein the audio encoding apparatus Band data extraction means for extracting the band data from the auxiliary data outputted from the sound source, sound source designation means for designating an audio sound source to control audio output among the one or more audio sound sources, and the auxiliary data Based on the extracted band data and the designated sound source output from the sound source designating means, the encoded data of the band including the audio data of the designated sound source is decoded with the amplitude of the audio data being limited. And a decoding means for outputting.

これにより、帯域データを用いて、複数のオーディオ音源のうちのいずれかの音源のオーディオ出力を任意に制御することができる。 Thereby, it is possible to arbitrarily control the audio output of any one of the plurality of audio sound sources using the band data.

本発明によれば、オーディオ符号化装置において、符号化データとともに、オーディオデータの音量レベル、オーディオデータのフレーム毎のカテゴリ、又はオーディオ音源が盛り込まれた帯域を示すデータを含む補助データを出力するようにしたから、オーディオ復号化装置でオーディオを再生する際に、補助データに含まれる音量レベル等を用いて、オーディオデータそのもの等の加工を行い、復号後のオーディオデータを加工して出力でき、利用価値の高いオーディオ符号化装置、及びオーディオ復号化装置を提供することができる。 According to the present invention, the audio encoding device outputs the auxiliary data including the encoded data and the volume level of the audio data, the category for each frame of the audio data, or the data indicating the band in which the audio sound source is included. Therefore, when audio is played back by the audio decoding device, the audio data itself can be processed using the volume level included in the auxiliary data, and the decoded audio data can be processed and output. A high-value audio encoding device and audio decoding device can be provided.

（実施の形態１）
以下、本発明の実施の形態１によるオーディオ符号化装置、及びオーディオ復号化装置について説明する。
図１は、本実施の形態１によるオーディオ符号化装置の構成を示すブロック図である。図１において、１０１は、入力されるオーディオデータを符号化して符号化データを出力する符号化手段、１０２は、入力されるオーディオデータの音量レベルを求めて出力する音量レベル出力手段、１０３は、オーディオデータに関する補助データを作成出力する補助データ出力手段である。これら符号化手段１０１、音量レベル出力手段１０２、及び補助データ出力手段１０３を含んでオーディオ符号化装置１００が構成される。 (Embodiment 1)
Hereinafter, an audio encoding device and an audio decoding device according to Embodiment 1 of the present invention will be described.
FIG. 1 is a block diagram showing a configuration of an audio encoding device according to the first embodiment. In FIG. 1, 101 is an encoding unit that encodes input audio data and outputs encoded data, 102 is a volume level output unit that calculates and outputs a volume level of input audio data, and 103 is Auxiliary data output means for creating and outputting auxiliary data relating to audio data. The audio encoding device 100 includes the encoding unit 101, the volume level output unit 102, and the auxiliary data output unit 103.

また、図２は、本実施の形態１によるオーディオ復号化装置の構成を示すブロック図である。図２において、２０１は、入力される補助データから音量レベルを抽出し抽出した音量レベルに応じた音量の倍率を出力する音量レベル抽出手段、２０２は、符号化データを復号化してオーディオデータを出力する復号化手段、２０３は、復号化手段２０２が出力するオーディオデータの音量を調節してオーディオ出力する音量調節手段である。これら音量レベル抽出手段２０１、復号化手段２０１、及び音量調節手段２０３を含んでオーディオ復号化装置２００が構成される。 FIG. 2 is a block diagram showing the configuration of the audio decoding apparatus according to the first embodiment. In FIG. 2, 201 is a volume level extraction means for extracting a volume level from input auxiliary data and outputting a volume magnification according to the extracted volume level, 202 is a decoder for decoding encoded data and outputting audio data Decoding means 203 for controlling the volume of the audio data output from the decoding means 202 is adjusted so as to output the audio. The audio decoding apparatus 200 includes the volume level extracting unit 201, the decoding unit 201, and the volume adjusting unit 203.

図３は、ＭＰＥＧ１−ＬａｙｅｒIIのオーディオフレームのフォーマットを示す図である。図３において、オーディオフレームは、ｈｅａｄｅｒ（ヘッダ）領域、Ｅｒｒｏｒ＿ｃｈｅｃｋ領域、ａｕｄｉｏ＿ｄａｔａ領域、ａｎｃｉｌｌａｒｙ＿ｄａｔａ領域を含んでいる。そして、上記ヘッダ領域は、“ｓｙｎｃｗｏｒｄ”、“ＩＤ”、“ｌａｙｅｒ”、“ｐｒｏｔｅｃｔｉｏｎ＿ｂｉｔ”、“ｂｉｔｒａｔｅ＿ｉｎｄｅｘ”、“ｓａｍｐｌｉｎｇ＿ｆｒｅｑｕｅｎｃｙ”、“ｐａｄｄｉｎｇ＿ｂｉｔ”、“ｐｒｉｖａｔｅ＿ｂｉｔ”、“ｍｏｄｅ”、“ｍｏｄｅ＿ｅｘｔｅｎｔｉｏｎ”、“ｃｏｐｙｒｉｇｈｔ”、“ｏｒｉｇｉｎａｌ／ｃｏｐｙ”，“ｅｍｐｈａｓｉｓ”といった符号化に関する各種のデータの領域を含む。 FIG. 3 is a diagram showing the format of an MPEG1-Layer II audio frame. In FIG. 3, the audio frame includes a header (header) area, an Error_check area, an audio_data area, and an ancillary_data area. The header area includes “syncword”, “ID”, “layer”, “protection_bit”, “bitrate_index”, “sampling_frequency”, “padding_bit”, “private_bit”, “mode”, “mode”, “mode”, “mode”, “mode”, “mode”, “mode”, “mode”, , “Original / copy”, “emphasis” and various data areas related to encoding.

次に本実施の形態１によるオーディオ符号化装置、及びオーディオ復号化装置の動作を説明する。
まず本実施の形態１によるオーディオ符号化装置１００の動作について説明する。 Next, operations of the audio encoding device and the audio decoding device according to Embodiment 1 will be described.
First, the operation of the audio encoding device 100 according to the first embodiment will be described.

図１に示すように、オーディオデータは符号化手段１０１、及び音量レベル出力手段１０２に入力される。符号化手段１０１はオーディオデータを入力すると、これを符号化して符号化データを出力する。また、符号化手段１０１は、このときの符号化に関する符号化関連データを補助データ出力手段１０３に出力する。符号化をＭＰＥＧ１−ＬａｙｅｒIIの規則にしたがって行う場合、符号化関連データとして、図３に示すように、オーディオフレームのヘッダ領域に含まれる、ビットレートを示す“ｂｉｔｒａｔｅ＿ｉｎｄｅｘ”、チャンネル符号を示す“ｍｏｄｅ”、帯域の境界を示す“ｍｏｄｅ＿ｅｘｔｅｎｓｉｏｎ”等を出力する。 As shown in FIG. 1, the audio data is input to the encoding unit 101 and the volume level output unit 102. When the audio data is input, the encoding unit 101 encodes the audio data and outputs the encoded data. Also, the encoding unit 101 outputs the encoding related data regarding the encoding at this time to the auxiliary data output unit 103. When encoding is performed in accordance with the MPEG1-LayerII rules, as shown in FIG. 3, “bitrate_index” indicating the bit rate and “mode” indicating the channel code are included in the header area of the audio frame as the encoding-related data. , “Mode_extension” indicating the band boundary is output.

一方、音量レベル出力手段１０２はオーディオデータを入力すると、楽曲１曲分の間の音量のピーク値と、平均音量を求め、「ピーク値：平均音量」を「８：２」の比率で重み付けして音量レベルを求める。例えば、音量レベル出力手段１０２は、測定した音量のピーク値が「６」であり、平均音量が「４」であるとき、６×０．８＋４×０．２で求められる「５．６」を該楽曲の音量レベルとして出力する。なお、音量レベルを割り出す方法は、特にこの方法に限定されるものではなく、ピーク値そのもの、もしくは平均音量そのものを音量レベルとして出力するようにしてもよい。また、ピーク値、及び平均音量を変数として、ピーク値と平均音量を所定の比率で重み付けするのとは異なる所定の演算をすることにより、音量レベルを求めるようにしてもよい。音量レベル出力手段１０２は、こうして求めた音量レベルを補助データ出力手段１０３に出力する。 On the other hand, when the audio level is input, the volume level output means 102 obtains the peak value and average volume of the music for one song, and weights “peak value: average volume” at a ratio of “8: 2”. To find the volume level. For example, when the peak value of the measured volume is “6” and the average volume is “4”, the volume level output unit 102 calculates “5.6” obtained by 6 × 0.8 + 4 × 0.2. Output as the volume level of the music. The method for determining the volume level is not particularly limited to this method, and the peak value itself or the average volume itself may be output as the volume level. Alternatively, the volume level may be obtained by performing a predetermined calculation different from weighting the peak value and the average volume with a predetermined ratio using the peak value and the average volume as variables. The volume level output unit 102 outputs the volume level thus obtained to the auxiliary data output unit 103.

補助データ出力手段１０３は、符号化手段１０１からの符号化関連データと音量レベル出力手段１０２からの音量レベルを入力すると、図３のＭＰＥＧ１−ＬａｙｅｒIIに基づいたフォーマットに変換して補助データとして出力する。 When the auxiliary data output means 103 receives the encoding-related data from the encoding means 101 and the volume level from the volume level output means 102, the auxiliary data output means 103 converts it into a format based on MPEG1-LayerII in FIG. 3 and outputs it as auxiliary data. .

本実施の形態１では、ＭＰＥＧ１−ＬａｙｅｒIIのストリーム中のヘッダ領域の“ｐｒｉｖａｔｅ＿ｂｉｔ”の部分に音量レベルの情報を含めて補助データを作成する。補助データ出力手段１０３は、音量レベル出力手段１０２から受け取った音量レベルの情報を、オーディオフレームのヘッダ領域に含まれる２ビットの“ｐｒｉｖａｔｅ＿ｂｉｔ”に埋め込む。２ビットの“ｐｒｉｖａｔｅ＿ｂｉｔ”を用いることにより、例えば“００”，“０１”，“１０”，“１１”で示される４段階の音量レベル「０」、「１」、「２」、「３」の情報を、補助データに含めることが可能である。本実施の形態１では、音量レベルの情報をユーザに開放されている領域に埋め込むようにしているので、該領域のデータを有意のものとして扱わない従来の復号化装置で復号する場合にその復号動作に影響を及ぼさない。また元々各規格で準備されている領域を使用しているので、音量レベルの情報を含めるためにビット量が増加することもない。 In the first embodiment, auxiliary data is created by including volume level information in the “private_bit” portion of the header area in the MPEG1-LayerII stream. The auxiliary data output unit 103 embeds the volume level information received from the volume level output unit 102 in 2-bit “private_bit” included in the header area of the audio frame. By using 2-bit “private_bit”, for example, four levels of volume levels “0”, “1”, “2”, “3” indicated by “00”, “01”, “10”, “11” are used. Can be included in the auxiliary data. In the first embodiment, the volume level information is embedded in an area open to the user. Therefore, when decoding is performed by a conventional decoding apparatus that does not treat the data in the area as significant, the decoding is performed. Does not affect the operation. In addition, since the area originally prepared in each standard is used, the amount of bits does not increase to include volume level information.

以上の動作により、本実施の形態１によるオーディオ符号化装置１００は、オーディオデータの符号化データとともに、オーディオデータの音量レベルを含む補助データを出力する。 Through the above operation, the audio encoding device 100 according to the first embodiment outputs auxiliary data including the volume level of the audio data together with the encoded data of the audio data.

次に本実施の形態１によるオーディオ復号化装置の動作について説明する。
図２に示すように、音量レベル抽出手段２０１は、補助データを入力すると、前記“ｐｒｉｖａｔｅ＿ｂｉｔ”に埋め込まれた情報、つまり音量レベルの情報を抽出する。 Next, the operation of the audio decoding apparatus according to the first embodiment will be described.
As shown in FIG. 2, when the auxiliary data is input, the volume level extraction unit 201 extracts information embedded in the “private_bit”, that is, volume level information.

音量レベル抽出手段２０１は、補助データから音量レベルを抽出すると、図４の表に示す規則に従って、音量の倍率を音量調節手段２０３に出力する。本実施の形態１では、図４に示すように、音量レベルが「０」のとき音量の倍率を１．１倍とし、音量レベルが「１」のとき音量の倍率を１倍とし、音量レベルが「２」のとき音量の倍率を０．９倍とし、音量レベルが「３」のとき音量の倍率を０．８倍とする。なお、音量レベルと、音量の倍率の関係は、これに限るものではない。 When the volume level extracting unit 201 extracts the volume level from the auxiliary data, the volume level extracting unit 201 outputs the volume magnification to the volume adjusting unit 203 according to the rules shown in the table of FIG. In the first embodiment, as shown in FIG. 4, when the volume level is “0”, the volume magnification is 1.1 times, and when the volume level is “1”, the volume magnification is 1 time. When “2”, the volume magnification is 0.9 times, and when the volume level is “3”, the volume magnification is 0.8 times. Note that the relationship between the volume level and the magnification of the volume is not limited to this.

復号化手段２０２は符号化データを入力すると復号化を行い、復号化されたオーディオを音量調節手段２０３に出力する。 Decoding means 202 performs decoding when the encoded data is input, and outputs the decoded audio to volume control means 203.

音量調節手段２０３は、復号化手段２０２から入力したオーディオの振幅に、音量レベル抽出手段２０１により抽出した音量レベルに対応する音量の倍率を掛けて出力する。 The volume adjusting unit 203 multiplies the audio amplitude input from the decoding unit 202 by the volume magnification corresponding to the volume level extracted by the volume level extracting unit 201 and outputs the result.

図５は、本実施の形態１によるオーディオ復号装置の音量調節手段２０３がオーディオの振幅を変更する様子を示す図である。楽曲の境の前後の符号化フレームを復号したオーディオデータの振幅が調整される様子を示す。 FIG. 5 is a diagram showing how the volume adjusting means 203 of the audio decoding device according to the first embodiment changes the audio amplitude. A state in which the amplitude of audio data obtained by decoding encoded frames before and after a music boundary is adjusted is shown.

図５に示すように、音量レベル抽出手段２０１により補助データから抽出された音量レベルは、第１の楽曲に含まれるフレーム＃ｎは「１」であり、第２の楽曲に含まれるフレーム＃ｎ＋１は「２」である。音量レベルが「１」のとき音量の倍率は、図４に示すように、１倍であるので、音量調節手段２０３は復号後のフレーム＃１のオーディオ振幅を１倍にして出力し、音量レベルが「２」のとき音量の倍率は、図４に示すように、０．９倍であるので、音量調節手段２０３は復号後のフレーム＃ｎ＋１のオーディオ振幅を０．９倍にして出力する。 As shown in FIG. 5, the volume level extracted from the auxiliary data by the volume level extracting means 201 is “1” in the frame #n included in the first music, and the frame # n + 1 included in the second music. Is “2”. When the volume level is “1”, the magnification of the volume is 1 as shown in FIG. 4. Therefore, the volume adjusting means 203 outputs the audio amplitude of frame # 1 after decoding by 1 and outputs the volume level. When “2” is “2”, the volume magnification is 0.9 times as shown in FIG. 4. Therefore, the volume adjusting means 203 outputs the audio amplitude of the decoded frame # n + 1 by 0.9 times.

これにより、復号化手段２０２に入力され復号された第２の楽曲のフレーム＃ｎ＋１の振幅は、第１の楽曲のフレーム＃ｎの振幅より大きいが、音量調節手段２０３から出力されるフレーム＃ｎ＋１の振幅はフレーム＃ｎの振幅と同じになる。このとき、スムーズに音量が変更できるように倍率を同一フレームでも徐々に変更するようにしてもよい。このようにして、オーディオ符号化装置１００から出力された補助データに含まれる音量レベルの情報に基づいて、楽曲間で差がある音量を一定のレベルに合わせることができ、オーディオデータを再生したときに、オーディオ振幅の変化による違和感を軽減することができる。 As a result, the amplitude of the frame # n + 1 of the second song input and decoded by the decoding unit 202 is larger than the amplitude of the frame #n of the first song, but the frame # n + 1 output from the volume adjusting unit 203 Is the same as the amplitude of frame #n. At this time, the magnification may be gradually changed even in the same frame so that the volume can be changed smoothly. In this way, when the audio data is reproduced, the volume having a difference between songs can be adjusted to a certain level based on the volume level information included in the auxiliary data output from the audio encoding device 100. In addition, it is possible to reduce a sense of incongruity due to a change in audio amplitude.

以上のように本実施の形態１によるオーディオ符号化装置は、入力されるオーディオデータを符号化処理して符号化データを出力すると共に、前記符号化に関する符号化関連データを出力する符号化手段１０１と、オーディオデータの音量レベルを出力する音量レベル出力手段１０２と、検出した音量レベルと、符号化関連データとから、音量レベルの情報を含む補助データを作成する補助データ出力手段１０３とを備えることにより、オーディオデータの音量レベルの情報を含む補助データをオーディオデータの符号化データとともに出力することができ、音量レベルの情報を用いたオーディオデータの加工を可能とできる。 As described above, the audio encoding device according to the first embodiment encodes input audio data and outputs encoded data, and also outputs encoding-related data related to the encoding. And volume level output means 102 for outputting the volume level of the audio data, and auxiliary data output means 103 for creating auxiliary data including volume level information from the detected volume level and the encoding related data. Thus, the auxiliary data including the volume level information of the audio data can be output together with the encoded data of the audio data, and the audio data can be processed using the volume level information.

また、本実施の形態１によるオーディオ復号化装置は、オーディオ符号化装置より出力される符号化データを復号してオーディオデータを出力する復号化手段２０２と、オーディオ符号化装置より出力される音量レベルの情報を含む補助データから前記音量レベルを抽出する音量レベル抽出手段２０１と、復号化手段２０２より出力されるオーディオデータの音量を、音量レベル抽出手段２０１により抽出した前記音量レベルに基づき調整して、出力する音量調整手段２０３とを備えることにより、補助データから抽出した音量レベルの情報を用いてオーディオデータを加工し、復号したオーディオデータの振幅を調整することができる。 Also, the audio decoding apparatus according to Embodiment 1 decodes the encoded data output from the audio encoding apparatus and outputs audio data, and the volume level output from the audio encoding apparatus. The volume level extracting unit 201 for extracting the volume level from the auxiliary data including the above information and the volume of the audio data output from the decoding unit 202 are adjusted based on the volume level extracted by the volume level extracting unit 201. By providing the output volume adjusting means 203, the audio data can be processed using the volume level information extracted from the auxiliary data, and the amplitude of the decoded audio data can be adjusted.

なお、上記実施の形態１では、補助データ出力手段が、音量レベルの情報をオーディオフレームのヘッダ領域に含まれる２ビットの“ｐｒｉｖａｔｅ＿ｂｉｔ”に埋め込むようにしたものについて説明したが、音量レベルの情報をオーディオフレームのａｎｃｉｌｌａｒｙ＿ｄａｔａ領域等、他の領域に埋め込むようにしてもよい。また、システムとして、領域が確保されている、もしくは余分がある領域を、利用してもよい。例えば図８に示されるようなＤＶＤ−Ｖｉｄｅｏ規格であれば、以下のように、各領域を、補助データとして活用することができる。すなわち、ＤＶＤ−Ｖｉｄｅｏ規格では、ＶＯＢＵという単位で管理されている。ＶＯＢＵには、必ず、ＮＶ＿ＰＣＫパックと呼ばれているシステム情報が含まれている。ＮＶ＿ＰＣＫパックの中には、ＰＣＩ＿ＰＣＫパックと呼ばれている再生制御情報が含まれていて、最後の１８バイトはＲｅｓｅｒｖｅｄ領域で定義されていない。そのためこのＲｅｓｅｒｖｅｄ領域を、補助データとして利用できる。また各規格で定義されていたとしてもシステムとして利用しない領域については、これを補助データに利用することが可能である。補助データは、その他、ＳＤ規格や、ブルーレイ規格中のオーディオ符号化データ以外の補助データを使用することも可能である。 In the first embodiment described above, the auxiliary data output unit has described the case where the volume level information is embedded in the 2-bit “private_bit” included in the header area of the audio frame. The audio frame may be embedded in another area such as an ancillary_data area. Moreover, you may utilize the area | region where the area | region is ensured or there exists an extra as a system. For example, in the DVD-Video standard as shown in FIG. 8, each area can be used as auxiliary data as follows. That is, in the DVD-Video standard, it is managed in units of VOBU. The VOBU always includes system information called an NV_PCK pack. The NV_PCK pack includes playback control information called a PCI_PCK pack, and the last 18 bytes are not defined in the Reserved area. Therefore, this Reserved area can be used as auxiliary data. Moreover, even if it is defined in each standard, it is possible to use this as auxiliary data for an area that is not used as a system. As the auxiliary data, auxiliary data other than the encoded audio data in the SD standard or the Blu-ray standard can be used.

また、上記実施の形態１ではオーディオデータの符号化、復号化をＭＰＥＧ１−ＬａｙｅｒIIの規則に従って行うものとしたが、オーディオデータの符号化、復号化の規則はこれに限られるものではない。 In the first embodiment, encoding and decoding of audio data are performed according to the MPEG1-Layer II rules. However, the encoding and decoding rules of audio data are not limited to this.

（実施の形態２）
以下、本発明の実施の形態２によるオーディオ符号化装置、及びオーディオ復号化装置について説明する。 (Embodiment 2)
Hereinafter, an audio encoding device and an audio decoding device according to Embodiment 2 of the present invention will be described.

図６は、本実施の形態２によるオーディオ符号化装置の構成を示すブロック図である。図６において、６０１は、入力されるオーディオデータを符号化して符号化データを出力する符号化手段、６０２は、入力されるオーディオデータを複数のカテゴリに分類するカテゴリ分類手段、６０３は、オーディオデータに関する補助データを作成出力する補助データ出力手段である。これら符号化手段６０１、カテゴリ分類手段６０２、及び補助データ出力手段６０３を含んでオーディオ符号化装置６００が構成される。 FIG. 6 is a block diagram showing the configuration of the audio encoding apparatus according to the second embodiment. In FIG. 6, reference numeral 601 denotes an encoding unit that encodes input audio data and outputs encoded data, 602 denotes a category classification unit that classifies input audio data into a plurality of categories, and 603 denotes audio data. Auxiliary data output means for generating and outputting auxiliary data related to The audio encoding apparatus 600 includes the encoding unit 601, the category classification unit 602, and the auxiliary data output unit 603.

また、図７は、本実施の形態２によるオーディオ復号化装置の構成を示すブロック図である。図７において、７０１は、入力される補助データからカテゴリ情報を抽出して出力するカテゴリ抽出手段、７０２は、符号化データを復号化してオーディオデータを出力する復号化手段である。また、７０４は複数のカテゴリのうちオーディオ出力を制御しようとするカテゴリを指定するカテゴリ指定手段である。７０３は、復号化手段７０２が出力するオーディオデータのオーディオ出力を、カテゴリ情報とカテゴリ指定手段７０４の出力とに基づいて制御するオーディオ出力手段である。これらカテゴリ抽出手段７０１、復号化手段７０２、オーディオ出力手段７０３、及びカテゴリ指定手段７０４を含んでオーディオ復号化装置７００が構成される。 FIG. 7 is a block diagram showing the configuration of the audio decoding apparatus according to the second embodiment. In FIG. 7, reference numeral 701 denotes category extraction means for extracting and outputting category information from input auxiliary data, and reference numeral 702 denotes decoding means for decoding encoded data and outputting audio data. Reference numeral 704 denotes category designation means for designating a category to be controlled for audio output among a plurality of categories. Reference numeral 703 denotes audio output means for controlling the audio output of the audio data output from the decoding means 702 based on the category information and the output of the category specifying means 704. The audio decoding apparatus 700 is configured by including the category extracting unit 701, the decoding unit 702, the audio output unit 703, and the category specifying unit 704.

図８は、ＤＶＤ−Ｖｉｄｅｏ規格のデータフォーマットを示す図である。ＤＶＤ−Ｖｉｄｅｏ規格では、ＶＯＢＵという単位で管理されている。ＶＯＢＵには、必ずＮＶ＿ＰＣＫパックと呼ばれているシステム情報が含まれている。ＮＶ＿ＰＣＫパックの中には、ＰＣＩ＿ＰＣＫパックと呼ばれている再生制御情報が含まれていて最後の１８バイトはＲｅｓｅｒｖｅｄ（ＲＳＶ）領域となっており使用用途が定義されていない。 FIG. 8 is a diagram showing a data format of the DVD-Video standard. In the DVD-Video standard, it is managed in units of VOBU. The VOBU always includes system information called an NV_PCK pack. The NV_PCK pack includes playback control information called a PCI_PCK pack, and the last 18 bytes are a Reserved (RSV) area, and the usage is not defined.

次に本実施の形態２によるオーディオ符号化装置、及びオーディオ復号化装置の動作を説明する。
まず本実施の形態２によるオーディオ符号化装置６００の動作について説明する。 Next, operations of the audio encoding device and the audio decoding device according to the second embodiment will be described.
First, the operation of the audio encoding device 600 according to the second embodiment will be described.

図６に示すように、符号化手段６０１はオーディオが入力されると該オーディオデータをサンプリングレートに応じて決まる所定の時間長のフレーム毎に符号化し、符号化データを出力する。また、符号化手段６０１はこのときの符号化に関する符号化関連データを補助データ出力手段６０３に出力する。ＤＶＤ−Ｖｉｄｅｏ規格の場合でもオーディオがＭＰＥＧ１−ＬａｙｅｒIIの場合であれば、基本的に符号化関連データは実施の形態１と同一であるため符号化関連データの詳細な説明は割愛する。 As shown in FIG. 6, when audio is input, the encoding unit 601 encodes the audio data for each frame having a predetermined time length determined according to the sampling rate, and outputs encoded data. Also, the encoding unit 601 outputs the encoding related data regarding the encoding at this time to the auxiliary data output unit 603. Even in the case of the DVD-Video standard, if the audio is MPEG1-Layer II, the encoding-related data is basically the same as that of the first embodiment, so a detailed description of the encoding-related data is omitted.

カテゴリ分類手段６０２は、入力したオーディオデータを、符号化手段６０１の符号化の単位であるフレームに分割し、フレーム毎に複数のカテゴリのいずれかに分類し、該フレームがいずれのカテゴリに分類されたかを示すカテゴリ情報を補助データ出力手段６０３に出力する。カテゴリ分類手段６０２が、オーディオデータのフレームの切れ目を知る方法としては、カテゴリ分類手段６０２でオーディオデータのサンプルをカウントし、カウント値がフレームのサンプル数と一致するタイミングを検出する方法を用いることができる。この他、符号化手段６０１からフレームの切れ目を示す同期信号を受け取ってフレームの切れ目を知ることも可能である。 The category classification unit 602 divides the input audio data into frames that are units of encoding of the encoding unit 601, classifies each frame into one of a plurality of categories, and the frame is classified into any category. The category information indicating whether or not is output to the auxiliary data output means 603. As a method for the category classification means 602 to know the frame breaks of the audio data, a method is used in which the category classification means 602 counts audio data samples and detects the timing at which the count value matches the number of frame samples. it can. In addition, it is also possible to know a frame break by receiving a synchronization signal indicating a frame break from the encoding means 601.

図９は、本実施の形態２におけるオーディオデータのフレームとカテゴリとの対応関係の例を示す図であり、図９において、上段はフレーム番号、中段はカテゴリ値、下段はオーディオデータを示す。この例では英語音声と日本語音声を含むオーディオデータを、フレーム毎に、日本語音声であるものはカテゴリ「２」、特定の英単語の音声であるものはカテゴリ「１」、特定の英単語以外の英語音声であるものはカテゴリ「０」として分類する。図に示すように、フレーム番号１０、１１、２１、２２、２３はカテゴリ「０」に分類され、フレーム番号１２、１３、２４、２５はカテゴリ「１」に分類され、フレーム番号１４〜２０、２６〜３１はカテゴリ「２」に分類される。 FIG. 9 is a diagram showing an example of the correspondence between audio data frames and categories in the second embodiment. In FIG. 9, the upper row shows frame numbers, the middle row shows category values, and the lower row shows audio data. In this example, audio data including English speech and Japanese speech is classified into categories “2” for Japanese speech, category “1” for speech of a specific English word, and specific English words for each frame. Those other than English speech are classified as category “0”. As shown in the figure, frame numbers 10, 11, 21, 22, 23 are classified into category “0”, frame numbers 12, 13, 24, 25 are classified into category “1”, and frame numbers 14-20, 26 to 31 are classified into the category “2”.

補助データ出力手段６０３は、符号化手段６０１からの符号化関連データとカテゴリ分類手段６０２からのカテゴリ情報を入力すると、図８のＤＶＤ−Ｖｉｄｅｏ規格のデータフォーマットに変換して補助データとして出力する。 When the auxiliary data output means 603 receives the encoding related data from the encoding means 601 and the category information from the category classification means 602, the auxiliary data output means 603 converts the data into the DVD-Video standard data format of FIG. 8 and outputs it as auxiliary data.

本実施の形態２では、ＤＶＤ−ＶｉｄｅｏのＮＶ＿ＰＣＫパック中のＰＣＩ＿ＰＣＫパックの部分が補助データに該当する。ここで、補助データ出力手段６０３は、図８のＰＣＩ＿ＰＣＫ情報を組上げるときに、入力したカテゴリ情報に基づいて、フレーム番号とカテゴリの値をセットにして順次ＲＳＶ領域に埋め込む。例えば、分類されるフレームのフレーム番号を４ビットの信号で表し、このフレームが属するカテゴリを２ビットの信号で表し、各フレームについてこれら４ビットと２ビットの信号を対にして順次ＲＳＶ領域に埋め込む。この時に、複数フレーム連続してカテゴリの値が同じである場合は、各フレーム毎にカテゴリの値を保存するのではなく、カテゴリの値を１回保存して、その後に同一のカテゴリのフレームの連続回数を保存すようにすれば、カテゴリ情報を圧縮することができる。このような構成は、同一のカテゴリに属するフレームが複数連続することが多い時に特に有効である。 In the second embodiment, the PCI_PCK pack portion in the DVD_Video NV_PCK pack corresponds to the auxiliary data. Here, the auxiliary data output means 603, when assembling the PCI_PCK information of FIG. 8, sets the frame number and the category value as a set and sequentially embeds them in the RSV area based on the input category information. For example, the frame number of the frame to be classified is represented by a 4-bit signal, the category to which this frame belongs is represented by a 2-bit signal, and for each frame, the 4-bit and 2-bit signals are paired and sequentially embedded in the RSV area. . At this time, if the category value is the same for a plurality of frames continuously, the category value is not saved for each frame, but the category value is saved once, and then the frames of the same category are saved. If the number of consecutive times is stored, the category information can be compressed. Such a configuration is particularly effective when there are many consecutive frames belonging to the same category.

補助データ出力手段６０３は、ＰＣＩ＿ＰＣＫパックが完成した段階で出力する。ＤＶＤ−Ｖｉｄｅｏ規格の場合は、ＰＣＩ＿ＰＣＫパックにはビデオ情報も必要な場合があるが、このような場合は、補助データ出力手段６０３から、ＲＳＶ領域に埋め込むカテゴリ情報を含む、ＰＣＩ＿ＰＣＫパックの作成に必要な情報の一部を出力し、符号化システムの別の手段により、所定のビデオ情報も含めてＰＣＩ＿ＰＣＫパックを組上げる構成としてもよい。本実施の形態２では、カテゴリ情報をユーザに開放されている領域に埋め込むようにしているので、該領域のデータを有意のものとして扱わない従来の復号化装置で復号する場合にその復号動作に影響を及ぼさない。また元々各規格で準備されている領域を使用しているので、カテゴリ情報を含めるためにビット量が増加することもない。 The auxiliary data output means 603 outputs it when the PCI_PCK pack is completed. In the case of the DVD-Video standard, video information may be required for the PCI_PCK pack. In such a case, it is necessary to create a PCI_PCK pack including category information embedded in the RSV area from the auxiliary data output means 603. Alternatively, a part of the information may be output and a PCI_PCK pack including predetermined video information may be assembled by another means of the encoding system. In the second embodiment, the category information is embedded in an area open to the user. Therefore, when decoding is performed by a conventional decoding apparatus that does not treat the data in the area as significant, the decoding operation is performed. Has no effect. Further, since the area originally prepared in each standard is used, the bit amount does not increase to include the category information.

上述した動作により、オーディオ符号化装置６００から、オーディオデータの符号化データとともに、オーディオデータのカテゴリ情報を含む補助データが出力される。 Through the above-described operation, the audio encoding device 600 outputs auxiliary data including audio data category information together with audio data encoded data.

次に本実施の形態２によるオーディオ復号化装置７００の動作について説明する。
図７に示すように、カテゴリ抽出手段７０１は補助データとしてＰＣＩ＿ＰＣＫパックを入力するとＲＳＶ領域からカテゴリ情報を抽出し、抽出したカテゴリをオーディオ出力手段７０３に出力する。ＲＳＶ領域にカテゴリの値と同一カテゴリのフレームの連続数が含まれている場合には、それを解釈してフレーム毎のカテゴリの値をオーディオ出力手段７０３に出力する。カテゴリ情報がフレーム単位で保存されているか、同一カテゴリのフレームの連続数で保存されているかは、オーディオ符号化装置とオーディオ復号化装置の間で予め決めていても良いし、連続数での繰り返しをする、しないの選択ビットをつけたフォーマットを決めておいてもよい。 Next, the operation of the audio decoding apparatus 700 according to the second embodiment will be described.
As shown in FIG. 7, when a PCI_PCK pack is input as auxiliary data, the category extraction unit 701 extracts category information from the RSV area, and outputs the extracted category to the audio output unit 703. When the RSV area includes a continuous number of frames of the same category as the category value, the RSV area interprets it and outputs the category value for each frame to the audio output means 703. Whether the category information is stored in units of frames or the number of consecutive frames of the same category may be determined in advance between the audio encoding device and the audio decoding device, or may be repeated in consecutive numbers. You may decide the format with the selection bit of whether or not.

復号化手段７０２は符号化データを入力すると復号化を行い、復号化されたオーディオをオーディオ出力手段７０３に出力する。 The decoding unit 702 performs decoding when the encoded data is input, and outputs the decoded audio to the audio output unit 703.

オーディオ出力手段７０３の動作を図９の例を用いて説明する。図９に示す例において、オーディオ出力を制御しようとするカテゴリが何も指定されていない場合は、オーディオ出力手段７０３は、「『Ｉａｍｈｕｎｇｒｙ．』『私はお腹が空いた。』『Ｉｅａｔａｎａｐｐｌｅ．』『私はリンゴを食べる。』」と復号化手段６０１が復号した通りにオーディオデータを出力する。 The operation of the audio output means 703 will be described using the example of FIG. In the example shown in FIG. 9, when no category for controlling the audio output is specified, the audio output means 703 displays “I am hungry.” “I am hungry.” “I eat. an apple. "I eat an apple." and the decoding means 601 outputs the audio data as decoded.

また外部からカテゴリ「１」だけ出力するように指示されると、カテゴリ指定手段７０４はオーディオ出力を制御しようとするカテゴリ「０」とカテゴリ「２」をカテゴリ指定信号として出力する。オーディオ出力手段７０３はカテゴリ指定手段７０４の出力に応じて、カテゴリ「０」とカテゴリ「２」に属するフレームのオーディオデータのオーディオ出力をしないように制御し、その結果、「『ｈｕｎｇｒｙ』『ａｐｐｌｅ』」、と特定単語だけがオーディオ出力される。 Further, when an instruction is given to output only the category “1” from the outside, the category specifying means 704 outputs the category “0” and the category “2” to be controlled for audio output as the category specifying signal. The audio output means 703 controls the audio data of the frames belonging to the category “0” and the category “2” not to be output in accordance with the output of the category specifying means 704, and as a result, ““ hungly ”“ apple ” ", Only a specific word is output as audio.

また外部からカテゴリ「２」だけをオーディオ出力をしないように指示されると、カテゴリ指定手段７０４はオーディオ出力を制御しようとするカテゴリ「２」をカテゴリ指定信号として出力する。 Further, when it is instructed not to output only the category “2” from the outside, the category specifying means 704 outputs the category “2” for controlling the audio output as a category specifying signal.

オーディオ出力手段７０３はカテゴリ指定手段７０４の出力に応じて、カテゴリ「２」に属するフレームのオーディオデータのオーディオ出力をしないように制御し、その結果、「『Ｉａｍｈｕｎｇｒｙ．』『Ｉｅａｔａｎａｐｐｌｅ．』」、と日本語音声以外の部分がオーディオ出力される。このように、語学学習の際に、１つの符号化データに対して外国語と日本語をそれぞれ単独に聞くことや、重要単語だけを抜き出して聞く、といった使用方法が可能である。 The audio output means 703 performs control so as not to output audio data of frames belonging to the category “2” in accordance with the output of the category specifying means 704. As a result, “I am hungry.” “I eat an apple” . ”” And other parts than Japanese speech are output as audio. In this way, during language learning, it is possible to use a single encoded data such as listening to a foreign language and Japanese individually, or extracting only important words and listening.

以上のように本実施の形態２によるオーディオ符号化装置６００では、入力されるオーディオデータをフレーム毎に符号化処理して符号化データを出力すると共に、前記符号化に関する符号化関連データを出力する符号化手段６０１と、オーディオデータをフレーム毎に、複数のカテゴリのいずれかに分類し、該フレームがいずれのカテゴリに分類されたかを示すカテゴリ情報を出力するカテゴリ分類手段６０２と、カテゴリ情報と符号化関連データとから、カテゴリ情報を含む補助データを作成する補助データ出力手段１０３とを備えることにより、オーディオデータのカテゴリ情報を含む補助データをオーディオデータの符号化データとともに出力することができ、カテゴリ情報を用いたオーディオデータの加工を可能とできる。 As described above, the audio encoding apparatus 600 according to the second embodiment encodes input audio data for each frame and outputs encoded data, and outputs encoding-related data related to the encoding. An encoding unit 601; a category classification unit 602 that classifies audio data into one of a plurality of categories for each frame; and outputs category information indicating to which category the frame is classified; and category information and code By providing auxiliary data output means 103 for generating auxiliary data including category information from the data related to encoding, auxiliary data including category information of audio data can be output together with encoded data of audio data. Audio data can be processed using information.

また、本実施の形態２によるオーディオ復号化装置７００では、オーディオ符号化装置より出力される符号化データを復号してオーディオデータを出力する復号化手段７０２と、オーディオ符号化装置より出力されるカテゴリ情報を含む補助データから前記カテゴリ情報を抽出するカテゴリ抽出手段７０１と、複数のカテゴリのうちオーディオ出力を制御しようとするカテゴリを指定するカテゴリ指定手段７０４と、カテゴリ抽出手段７０１により抽出したカテゴリ情報とカテゴリ指定手段７０４の出力する指定カテゴリとに基づき、指定カテゴリに属するフレームのオーディオ出力を出力制御して出力するオーディオ出力手段７０３とを備えることにより、補助データから抽出したカテゴリ情報を用いてオーディオデータを加工し、所定のカテゴリに属する音声のオーディオ出力を制御することができる。 Also, in the audio decoding apparatus 700 according to the second embodiment, a decoding unit 702 that decodes encoded data output from the audio encoding apparatus and outputs audio data, and a category output from the audio encoding apparatus. Category extracting means 701 for extracting the category information from auxiliary data including information, category specifying means 704 for specifying a category to be controlled for audio output among a plurality of categories, and category information extracted by the category extracting means 701 Audio output means 703 that outputs and controls the audio output of the frames belonging to the designated category based on the designated category output by the category designation means 704, so that the audio data using the category information extracted from the auxiliary data Processed It is possible to control the audio output of the audio belonging to categories.

なお、上記実施の形態２では、カテゴリの数が３つである場合について説明したが、これに限るものではない。 In the second embodiment, the case where the number of categories is three has been described. However, the present invention is not limited to this.

また、上記実施の形態２では、カテゴリ「１」のデータのみを出力する場合、またはカテゴリ「２」のデータを出力しない場合について説明したが、特定カテゴリの出力、未出力の指示も上記以外にも可能である。 Further, in the second embodiment, the case where only the data of category “1” is output or the case where the data of category “2” is not output has been described. Is also possible.

また、上記実施の形態２では、補助データ出力手段が、カテゴリの情報を、ＤＶＤ−ＶｉｄｅｏのＶＯＢＵのＮＶ＿ＰＣＫパック中のＰＣＩ＿ＰＣＫパックのＲｅｓｅｒｖｅｄ領域に埋め込むようにしたものについて説明したが、オーディオフレームのヘッダ領域に含まれる２ビットの“ｐｒｉｖａｔｅ＿ｂｉｔ”等、他の領域に埋め込むようにしてもよい。 In the second embodiment, the auxiliary data output means has described that the category information is embedded in the Reserved area of the PCI_PCK pack in the NV_PCK pack of the DVD-Video VOBU. It may be embedded in another area such as 2-bit “private_bit” included in the area.

また、本実施の形態２では、オーディオデータの符号化、復号化をＭＰＥＧ１−ＬａｙｅｒIIの規則に従って行い、システムがＤＶＤ−Ｖｉｄｅｏ規格のシステムであるものとしたが、オーディオデータの符号化、復号化の規則、およびシステムの規格はこれに限られるものではない。 In the second embodiment, encoding and decoding of audio data are performed according to the MPEG1-Layer II rules, and the system is a DVD-Video standard system. However, encoding and decoding of audio data is performed. Rules and system standards are not limited to this.

（実施の形態３）
以下、本発明の実施の形態３によるオーディオ符号化装置、及びオーディオ復号化装置について説明する。
本実施の形態３は、オーディオ符号化装置の構成が上記実施の形態２によるオーディオ符号化装置と同一であり、オーディオ復号化装置の構成が上記実施の形態２によるオーディオ復号化装置と異なるものである。 (Embodiment 3)
Hereinafter, an audio encoding device and an audio decoding device according to Embodiment 3 of the present invention will be described.
In the third embodiment, the configuration of the audio encoding device is the same as that of the audio encoding device according to the second embodiment, and the configuration of the audio decoding device is different from that of the audio decoding device according to the second embodiment. is there.

図１０は、本実施の形態３によるオーディオ復号化装置の構成を示すブロック図である。１００１は、入力される補助データからカテゴリ情報を抽出して出力するカテゴリ抽出手段、１００４は複数のカテゴリのうちオーディオ出力を禁止しようとするカテゴリを指定する出力禁止カテゴリ指定手段である。１００２は、カテゴリ情報と出力禁止カテゴリ指定手段１００４の出力とに基づいて符号化データを復号化してオーディオデータを出力する復号化手段、１００３は、復号化手段１００２が出力するオーディオデータのオーディオ出力を制御するオーディオ出力手段である。これらカテゴリ抽出手段１００１、復号化手段１００２、オーディオ出力手段１００３、及び出力禁止カテゴリ指定手段１００４を含んでオーディオ復号化装置１０００が構成される。 FIG. 10 is a block diagram showing the configuration of the audio decoding apparatus according to the third embodiment. Reference numeral 1001 denotes category extraction means for extracting and outputting category information from input auxiliary data. Reference numeral 1004 denotes output inhibition category designation means for designating a category for which audio output is to be prohibited among a plurality of categories. Reference numeral 1002 denotes decoding means for decoding the encoded data based on the category information and the output of the output prohibition category specifying means 1004 and outputting audio data. Reference numeral 1003 denotes the audio output of the audio data output from the decoding means 1002. Audio output means to control. The audio decoding apparatus 1000 includes the category extracting unit 1001, the decoding unit 1002, the audio output unit 1003, and the output prohibited category specifying unit 1004.

次に本実施の形態３によるオーディオ復号化装置１０００の動作について図９の例を用いて説明する。
カテゴリ抽出手段１００１は、実施の形態２のカテゴリ抽出手段７０１と同じ動作をするためその説明を省略する。 Next, the operation of the audio decoding apparatus 1000 according to the third embodiment will be described using the example of FIG.
The category extraction unit 1001 operates in the same manner as the category extraction unit 701 of the second embodiment, and thus description thereof is omitted.

復号化手段１００２は、外部からの指示に応じて以下のように動作する。
まず、外部から何も指定されていない場合は、出力禁止カテゴリ指定手段１００４はオーディオ出力を禁止するカテゴリを出力せず、復号化手段１００２は、実施の形態２の符号化手段６０１と同様、すべての符号化データを復号化し復号化されたオーディオをオーディオ出力手段１００３に出力する。 Decoding means 1002 operates as follows in response to an instruction from the outside.
First, when nothing is specified from the outside, the output prohibition category specifying unit 1004 does not output a category for prohibiting audio output, and the decoding unit 1002 is all the same as the encoding unit 601 of the second embodiment. Are decoded and the decoded audio is output to the audio output means 1003.

また外部からカテゴリ「１」だけ出力するように指示されると、出力禁止カテゴリ指定手段１００４はオーディオ出力を禁止しようとするカテゴリ「０」とカテゴリ「２」をカテゴリ指定信号として出力する。復号化手段１００２は、出力禁止カテゴリ指定手段１００４の出力に応じ、図３に示すオーディオフレームのヘッダ領域の”ｓａｍｐｌｉｎｇ＿ｆｒｅｑｕｅｎｃｙ”から１秒間のフレーム数を求め、”ｂｉｔｒａｔｅ＿ｉｎｄｅｘ”から１秒間の転送レートを割り出す。そして、両者から１フレーム当たりのビット数を求める。１フレーム当たりのビット数が求まれば、復号化を行わず、求めたビット数まで符号化データをスキップする。例えば、外部からカテゴリ１だけ出力するように指示された場合、図９に示すように、カテゴリ０のフレーム１０、１１と、カテゴリ２のフレーム１４〜２０と、カテゴリ０のフレーム２１〜２３と、カテゴリ２のフレーム２６〜３１は復号化を行わず、符号化データをスキップする。 When an instruction is given to output only the category “1” from the outside, the output prohibition category designating unit 1004 outputs the category “0” and the category “2” that are intended to prohibit the audio output as the category designation signal. The decoding unit 1002 obtains the number of frames per second from “sampling_frequency” in the header area of the audio frame shown in FIG. 3 according to the output of the output prohibition category specifying unit 1004, and calculates the transfer rate per second from “bitrate_index”. . Then, the number of bits per frame is obtained from both. When the number of bits per frame is obtained, the encoded data is skipped up to the obtained number of bits without decoding. For example, when it is instructed to output only category 1 from the outside, as shown in FIG. 9, category 0 frames 10 and 11, category 2 frames 14 to 20, category 0 frames 21 to 23, The frames 26 to 31 of category 2 are not decoded and the encoded data is skipped.

符号化データをスキップする方法としては、上述の方法のほか、図３に示すオーディオフレームのヘッダ領域の”ｓｙｎｃｗｏｒｄ”のパターンがスキップするフレームの個数分出現するまでデータをスキップする方法を用いてもよい。これにより、復号化手段１００２からはフレーム１２、１３、２４、２５の符号化データを復号化したオーディオデータが出力され、オーディオ出力手段１００３はこの復号化されたオーディオデータをオーディオ出力する。 As a method of skipping encoded data, in addition to the above-described method, a method of skipping data until the “syncword” pattern in the header area of the audio frame shown in FIG. 3 appears for the number of skipped frames may be used. Good. As a result, audio data obtained by decoding the encoded data of the frames 12, 13, 24, and 25 is output from the decoding unit 1002, and the audio output unit 1003 outputs the decoded audio data as audio.

ここで復号化しないでスキップするフレーム数が異なると、オーディオ出力手段１００３へのオーディオデータの出力の間隔が不均一になることが考えられる。本実施の形態３では、オーディオ出力手段１００３は、出力禁止カテゴリ指定手段１００４の出力禁止カテゴリに応じて、復号化手段１００２が復号をしていない符号化データのままのデータが出力される期間が一定となるように制御する。このような制御は、例えば、復号化手段１００２の出力データを、復号化手段１００２が復号をしていない部分に対応する期間を除いて、バッファメモリ等に所定時間分蓄積した後、該蓄積したデータを順に出力することによって行なうことができる。 Here, if the number of frames to be skipped without decoding is different, it is conceivable that the interval of output of audio data to the audio output means 1003 becomes non-uniform. In the third embodiment, the audio output means 1003 has a period in which the encoded data that is not decoded by the decoding means 1002 is output according to the output prohibited category of the output prohibited category specifying means 1004. Control to be constant. Such control is performed, for example, by accumulating the output data of the decoding unit 1002 for a predetermined time in a buffer memory or the like, excluding a period corresponding to a portion not decoded by the decoding unit 1002. This can be done by outputting data sequentially.

以上のように本実施の形態３によるオーディオ復号化装置１０００では、オーディオ符号化装置より出力される補助データからカテゴリ情報を抽出するカテゴリ抽出手段１００１と、複数のカテゴリのうちオーディオ出力しないカテゴリを指定する出力禁止カテゴリ指定手段１００４と、カテゴリ抽出手段１００１により抽出したカテゴリ情報と、出力禁止カテゴリ指定手段１００４の出力とに基づき、オーディオ符号化装置より出力される符号化データのうち、出力禁止カテゴリ指定手段１００４より出力されるカテゴリに属するフレームの符号化データを除く符号化データを復号してオーディオデータを出力する復号化手段とを備えることにより、オーディオ符号化装置より出力される符号化データを復号出力する際に、カテゴリ情報を用いて、所定のカテゴリに属する音声のオーディオ出力を制御することができる。 As described above, in the audio decoding apparatus 1000 according to the third embodiment, the category extraction unit 1001 that extracts the category information from the auxiliary data output from the audio encoding apparatus, and the category that does not output audio among a plurality of categories is designated. Output prohibition category designation means 1004, category information extracted by the category extraction means 1001, and output of the output prohibition category designation means 1004, among the encoded data output from the audio encoding device, output prohibition category designation Decoding the encoded data output from the audio encoding device by decoding the encoded data excluding the encoded data of the frame belonging to the category output from the means 1004 and outputting the audio data When outputting the category information There, it is possible to control the audio output of the audio belonging to a predetermined category.

なお、上記実施の形態３では、カテゴリの数が３つである場合について説明したが、これに限るものではない。 In the third embodiment, the case where the number of categories is three has been described. However, the present invention is not limited to this.

また、上記実施の形態３では、補助データ出力手段が、カテゴリの情報を、ＤＶＤ−ＶｉｄｅｏのＶＯＢＵのＮＶ＿ＰＣＫパック中のＰＣＩ＿ＰＣＫパックのＲｅｓｅｒｖｅｄ領域に埋め込むようにしたものについて説明したが、オーディオフレームのヘッダ領域に含まれる２ビットの“ｐｒｉｖａｔｅ＿ｂｉｔ”等、他の領域に埋め込むようにしてもよい。 In the third embodiment, the auxiliary data output unit has described that the category information is embedded in the Reserved area of the PCI_PCK pack in the NV_PCK pack of the DVD-Video VOBU. It may be embedded in another area such as 2-bit “private_bit” included in the area.

また、本実施の形態３では、オーディオデータの符号化、復号化をＭＰＥＧ１−ＬａｙｅｒIIの規則に従って行うものとしたが、オーディオデータの符号化、復号化の規則はこれに限られるものではない。 In the third embodiment, encoding and decoding of audio data are performed according to the MPEG1-Layer II rules, but the encoding and decoding rules of audio data are not limited to this.

（実施の形態４）
以下、本発明の実施の形態４によるオーディオ符号化装置、及びオーディオ復号化装置について説明する。 (Embodiment 4)
Hereinafter, an audio encoding device and an audio decoding device according to Embodiment 4 of the present invention will be described.

図１１は、本発明の実施の形態４によるオーディオ符号化装置の構成を示すブロック図である。図１１において、１１０１は、複数のオーディオ音源からのオーディオデータを符号化して符号化データを出力する符号化手段、１１０２は、符号化手段１１０１が出力する帯域データと、符号化関連データとからオーディオデータに関する補助データを作成出力する補助データ出力手段である。これら符号化手段１１０１、及び補助データ出力手段１１０２を含んでオーディオ符号化装置１１００が構成される。 FIG. 11 is a block diagram showing a configuration of an audio encoding device according to Embodiment 4 of the present invention. In FIG. 11, reference numeral 1101 denotes an encoding unit that encodes audio data from a plurality of audio sound sources and outputs encoded data, and 1102 denotes audio data based on band data output from the encoding unit 1101 and encoding-related data. Auxiliary data output means for creating and outputting auxiliary data related to data. The audio encoding device 1100 includes the encoding unit 1101 and the auxiliary data output unit 1102.

また、図１２は、本発明の実施の形態４によるオーディオ復号化装置の構成を示すブロック図である。図１２において、１２０１は、入力される補助データから帯域データを抽出する帯域データ抽出手段、１２０２は、入力される符号化データを復号化する復号化手段である。また、１２０３は複数のオーディオ音源のうちオーディオ出力を制御しようとするオーディオ音源を指定する音源指定手段である。これら帯域データ抽出手段１２０１、復号化手段１２０２、及び音源指定手段１２０３を含んでオーディオ復号化装置１２００が構成される。 FIG. 12 is a block diagram showing the configuration of the audio decoding apparatus according to Embodiment 4 of the present invention. In FIG. 12, reference numeral 1201 denotes band data extraction means for extracting band data from input auxiliary data, and 1202 denotes decoding means for decoding input encoded data. Reference numeral 1203 denotes sound source designating means for designating an audio sound source to be controlled for audio output among a plurality of audio sound sources. An audio decoding device 1200 is configured including the band data extracting unit 1201, the decoding unit 1202, and the sound source specifying unit 1203.

次に本実施の形態４によるオーディオ符号化装置、及びオーディオ復号化装置の動作を説明する。
まず本実施の形態４によるオーディオ符号化装置１１００の動作について説明する。本実施の形態４では、符号化手段１１０１に楽器音源のオーディオデータと音声音源のオーディオデータを入力する場合を想定する。なお、符号化手段１１０１へのオーディオ音源の入力は２入力に限るものではなく、楽器と音声に限るものでもない。例えば、符号化手段１１０１への入力を３入力とし、１つはピアノ、１つはバイオリン、１つは拍手といった３つのオーディオ音源を入力することも想定できる。 Next, operations of the audio encoding device and the audio decoding device according to the fourth embodiment will be described.
First, the operation of the audio encoding device 1100 according to the fourth embodiment will be described. In the fourth embodiment, it is assumed that audio data of a musical instrument sound source and audio data of a sound source are input to the encoding unit 1101. Note that the input of the audio source to the encoding unit 1101 is not limited to two inputs, and is not limited to a musical instrument and voice. For example, it can be assumed that the input to the encoding unit 1101 is three inputs, and three audio sources such as one piano, one violin, and one applause are input.

ＭＰＥＧ１−ＬａｙｅｒIIの場合、符号化するときに、オーディオの帯域を“ｓｂ（サブバンド）”として０〜３１までの３２個に分割できる。本実施の形態４ではこのうち音声音源のオーディオデータの符号化に“ｓｂ”を２〜８まで使用したとする。符号化手段１１０１は符号化した符号化データを出力し、さらに、音声音源のオーディオデータが符号化の際の複数のオーディオ帯域“ｓｂ”０〜３１までのうちの“ｓｂ”２〜８のオーディオ帯域に盛り込まれたことを示す帯域データする。帯域データとして“ｓｂ”の２〜８を補助データ出力手段１１０２に出力する。 In the case of MPEG1-LayerII, when encoding, the audio band can be divided into 32 bands from 0 to 31 as “sb (subband)”. In the fourth embodiment, it is assumed that “sb” from 2 to 8 is used for encoding the audio data of the sound source. The encoding means 1101 outputs the encoded data, and further, audio of “sb” 2 to 8 among a plurality of audio bands “sb” 0 to 31 when the audio data of the sound source is encoded. Band data indicating that the data is included in the band. 2 to 8 of “sb” are output to the auxiliary data output means 1102 as band data.

本実施の形態４によるオーディオ符号化装置において、音声音源のオーディオデータの符号化の際に“ｓｂ”２と８の帯域を削減し、“ｓｂ”３と４の符号化データの振幅を大きくする、といった加工を行うことも可能である。これにより、音声音源のオーディオデータをより狭い帯域に盛り込むことができる。ここで符号化手段が指定されたオーディオ音源の帯域を削減するのに、オーディオ音源の帯域の両端を削減しているが、これに限られるものではなく、オーディオ音源の帯域の一端側のみで削減するようにしてもよい。また、このように符号化の際に指定されたオーディオ音源の帯域を削減する場合は、別の帯域で補正をすることもできる。 In the audio encoding device according to the fourth embodiment, the bands of “sb” 2 and 8 are reduced and the amplitude of the encoded data of “sb” 3 and 4 is increased when encoding audio data of a sound source. It is also possible to perform processing such as. Thereby, the audio data of the sound source can be included in a narrower band. Here, both ends of the audio source band are reduced in order to reduce the band of the audio source specified by the encoding means, but this is not limited to this, and it is reduced only at one end side of the audio source band. You may make it do. Further, when the band of the audio sound source designated at the time of encoding is reduced in this way, correction can be performed in another band.

補助データ出力手段１１０２は、図８のＰＣＩ＿ＰＣＫ情報を組上げるときに、ＲＳＶ領域に順次入力した帯域データを保存する。補助データ出力手段１１０２は、ＰＣＩ＿ＰＣＫパックが完成した段階で補助データを出力する。実施の形態２と同様、補助データ出力手段１１０２から、ＲＳＶ領域に埋め込む帯域データの情報を含む、ＰＣＩ＿ＰＣＫパックの作成に必要な情報の一部を出力し、符号化システムの別の手段により、所定のビデオ情報も含めてＰＣＩ＿ＰＣＫパックを組上げる構成としてもよい。本実施の形態４では、帯域データをユーザに開放されている領域に埋め込むようにしているので、該領域のデータを有意のものとして扱わない従来の復号化装置で復号する場合にその復号動作に影響を及ぼさない。また元々各規格で準備されている領域を使用しているので、帯域データを含めるためにビット量が増加することもない。 The auxiliary data output unit 1102 stores the band data sequentially input to the RSV area when the PCI_PCK information of FIG. 8 is assembled. The auxiliary data output unit 1102 outputs auxiliary data when the PCI_PCK pack is completed. As in the second embodiment, the auxiliary data output unit 1102 outputs a part of information necessary for creating the PCI_PCK pack including information on the band data to be embedded in the RSV area, and is determined by another unit of the encoding system. The PCI_PCK pack may be assembled including the video information. In the fourth embodiment, since band data is embedded in an area open to the user, when decoding is performed by a conventional decoding apparatus that does not treat the data in the area as significant, the decoding operation is performed. Has no effect. Further, since the area originally prepared in each standard is used, the amount of bits does not increase to include band data.

以上の動作により、本実施の形態４によるオーディオ符号化装置１１００は、オーディオデータの符号化データとともに、オーディオデータの帯域データを含む補助データを出力する。 Through the above operation, audio encoding apparatus 1100 according to the fourth embodiment outputs auxiliary data including band data of audio data together with encoded data of audio data.

次に本実施の形態４によるオーディオ復号化装置１２００の動作について説明する。図１２に示すように、帯域データ抽出手段１２０１は補助データから帯域データを抽出し、復号化手段１２０２に出力する。復号化手段１２０２は、オーディオ出力を制御しようとするオーディオ音源が何も指定されていない場合は、すべてのオーディオ帯域“ｓｂ”の符号化データをそのまま復号し、復号したオーディオを出力する。 Next, the operation of the audio decoding apparatus 1200 according to the fourth embodiment will be described. As shown in FIG. 12, the band data extracting unit 1201 extracts band data from the auxiliary data and outputs the band data to the decoding unit 1202. When no audio sound source to be controlled for audio output is designated, the decoding unit 1202 decodes the encoded data of all the audio bands “sb” as they are, and outputs the decoded audio.

外部からオーディオ出力を制御しようとするオーディオ音源の指定があれば、音源指定手段１２０３がオーディオ出力を制御しようとするオーディオ音源を指定する音源指定信号を出力する。復号化手段１２０２は音源指定手段１２０３の出力に応じて、帯域データ抽出手段１２０１から受け取った帯域データが示す、音源指定信号に指定されるオーディオ音源が盛り込まれたオーディオ帯域“ｓｂ”の符号化データをその振幅を半分に制限して復号し、それ以外は通常に復号する。ここで、復号化手段１２０２は、帯域データが示す指定された音源が盛り込まれた帯域の振幅を半分にしているが、これに限らずゼロやその他の値にすることも可能である。このように音声音源のオーディオデータの出力を制御することにより、音声付のオーディオデータをカラオケとして出力することができる。 If an audio sound source to be controlled externally is designated, the sound source designating unit 1203 outputs a sound source designation signal for designating the audio sound source to be controlled for audio output. The decoding unit 1202 encodes the audio band “sb” in which the audio sound source specified by the sound source specifying signal indicated by the band data received from the band data extracting unit 1201 is included according to the output of the sound source specifying unit 1203. Is decoded with its amplitude limited to half, otherwise it is decoded normally. Here, the decoding unit 1202 halves the amplitude of the band including the designated sound source indicated by the band data, but is not limited to this, and may be zero or other values. By controlling the output of audio data of the sound source in this way, audio data with sound can be output as karaoke.

以上のように本実施の形態４によるオーディオ符号化装置１１００では、複数のオーディオ音源のうちの１つ以上のオーディオ音源について該オーディオ音源のオーディオデータが符号化の際の複数のオーディオ帯域のうちのどのオーディオ帯域に盛り込まれたかを示す帯域データ、及び符号化に関する符号化関連データを、複数のオーディオ音源からのオーディオデータを符号化した符号化データとともに、出力する符号化手段１１０１と、符号化手段１１０１より出力される帯域データと符号化関連データとから、補助データを作成し出力する補助データ出力手段１１０２とを備えることにより、オーディオデータの帯域データの情報を含む補助データをオーディオデータの符号化データとともに出力することができ、帯域データの情報を用いたオーディオデータの加工を可能とできる。 As described above, in the audio encoding device 1100 according to the fourth embodiment, for one or more audio sound sources of the plurality of audio sound sources, the audio data of the audio sound source is encoded among a plurality of audio bands. Encoding means 1101 for outputting band data indicating which audio band is included, and encoding-related data related to encoding, together with encoded data obtained by encoding audio data from a plurality of audio sound sources, and encoding means Auxiliary data output means 1102 for generating and outputting auxiliary data from the band data output from 1101 and encoding-related data is provided, thereby encoding auxiliary data including information on band data of audio data into audio data. Can be output together with data, bandwidth data information It allows the processing of audio data used.

また、本実施の形態４によるオーディオ復号化装置１２００では、オーディオ符号化装置より出力される補助データから帯域データを抽出する帯域データ抽出手段１２０１と、オーディオ出力を制御しようとするオーディオ音源を指定する音源指定手段１２０３と、帯域データ抽出手段１２０１により抽出した帯域データと、音源指定手段１２０３の出力とに基づき、音源指定手段１２０３により指定されるオーディオ音源のオーディオデータが盛り込まれた帯域の符号化データを、そのオーディオデータの振幅を制限して復号して出力する復号化手段１２０２とを備えることにより、補助データから抽出した帯域データの情報を用いてオーディオデータを加工し、復号したオーディオデータのオーディオ出力を制御することができる。 Also, in the audio decoding apparatus 1200 according to the fourth embodiment, band data extraction means 1201 for extracting band data from auxiliary data output from the audio encoding apparatus, and an audio source for controlling the audio output are designated. Based on the band data extracted by the sound source designating unit 1203, the band data extracting unit 1201, and the output of the sound source designating unit 1203, the encoded data of the band including the audio data of the audio sound source designated by the sound source designating unit 1203 Is decoded and output by limiting the amplitude of the audio data, and the audio data is processed using the band data information extracted from the auxiliary data, and the audio of the decoded audio data The output can be controlled.

なお、上記実施の形態４では、補助データ出力手段が、カテゴリの情報を、ＤＶＤ−ＶｉｄｅｏのＶＯＢＵのＮＶ＿ＰＣＫパック中のＰＣＩ＿ＰＣＫパックのＲｅｓｅｒｖｅｄ領域に埋め込むようにしたものについて説明したが、オーディオフレームのヘッダ領域に含まれる２ビットの“ｐｒｉｖａｔｅ＿ｂｉｔ”等、他の領域に埋め込むようにしてもよい。 In the fourth embodiment, the auxiliary data output unit has described that the category information is embedded in the Reserved area of the PCI_PCK pack in the NV_PCK pack of the DVD-Video VOBU. It may be embedded in another area such as 2-bit “private_bit” included in the area.

また、本実施の形態４では、オーディオデータの符号化、復号化をＭＰＥＧ１−ＬａｙｅｒIIの規則に従って行うものとしたが、オーディオデータの符号化、復号化の規則はこれに限られるものではない。 In the fourth embodiment, encoding and decoding of audio data are performed according to the MPEG1-Layer II rule, but the encoding and decoding rules of audio data are not limited to this.

また、上記実施の形態１によるオーディオ符号化装置、実施の形態２または実施の形態３によるオーディオ符号化装置、及び実施の形態４によるオーディオ符号化装置のいずれか２つ、または３つを組み合わせた構成とすることも可能であり、また、実施の形態１によるオーディオ復号化装置、実施の形態２によるオーディオ復号化装置、及び実施の形態４によるオーディオ復号化装置のいずれか２つ、または３つを組み合わせた構成とすることも可能である。 Also, any two or three of the audio encoding device according to the first embodiment, the audio encoding device according to the second or third embodiment, and the audio encoding device according to the fourth embodiment are combined. The audio decoding device according to the first embodiment, the audio decoding device according to the second embodiment, and the audio decoding device according to the fourth embodiment are also possible. It is also possible to adopt a configuration combining the above.

本発明は、オーディオ符号化装置において、符号化データとともに、オーディオデータの音量レベル、オーディオデータのフレーム毎のカテゴリ、又はオーディオ音源が盛り込まれた帯域を示すデータを含む補助データを出力し、オーディオ復号化装置において、補助データに含まれた情報を用いてオーディオデータそのものを加工し、復号後のオーディオデータを加工して出力できるようにしたものであり、利用価値の高いオーディオ符号化装置、及びオーディオ復号化装置を提供する上で有用である。 According to the present invention, an audio encoding apparatus outputs auxiliary data including data indicating a volume level of audio data, a category for each frame of audio data, or a band in which an audio sound source is included, together with encoded data. In the encoding apparatus, the audio data itself is processed using the information included in the auxiliary data, and the decoded audio data can be processed and output. This is useful in providing a decoding device.

本発明の実施の形態１にかかるオーディオ符号化装置の構成を示すブロック図である。1 is a block diagram showing a configuration of an audio encoding device according to a first exemplary embodiment of the present invention. 本発明の実施の形態１にかかるオーディオ復号化装置の構成を示すブロック図である。It is a block diagram which shows the structure of the audio decoding apparatus concerning Embodiment 1 of this invention. ＭＰＥＧ１−ＬａｙｅｒII規格のフォーマットを示す図である。It is a figure which shows the format of MPEG1-LayerII standard. 本発明の実施の形態１にかかるオーディオ復号化装置の音量レベル抽出手段の音量レベルを示す図である。It is a figure which shows the volume level of the volume level extraction means of the audio decoding apparatus concerning Embodiment 1 of this invention. 本発明の実施の形態１にかかるオーディオ復号化装置の音量調節手段がオーディオの振幅を変更することを示す図である。It is a figure which shows that the volume control means of the audio decoding apparatus concerning Embodiment 1 of this invention changes the amplitude of an audio. 本発明の実施の形態２にかかるオーディオ符号化装置の構成を示すブロック図である。It is a block diagram which shows the structure of the audio coding apparatus concerning Embodiment 2 of this invention. 本発明の実施の形態２にかかるオーディオ復号化装置の構成を示すブロック図である。It is a block diagram which shows the structure of the audio decoding apparatus concerning Embodiment 2 of this invention. ＤＶＤ−Ｖｉｄｅｏ規格のフォーマットを示す図である。It is a figure which shows the format of DVD-Video specification. 本発明の実施の形態２にかかるオーディオ符号化装置のカテゴリ分類手段のカテゴリを示す図である。It is a figure which shows the category of the category classification | category means of the audio coding apparatus concerning Embodiment 2 of this invention. 本発明の実施の形態３にかかるオーディオ復号化装置の構成を示すブロック図である。It is a block diagram which shows the structure of the audio decoding apparatus concerning Embodiment 3 of this invention. 本発明の実施の形態４にかかるオーディオ符号化装置の構成を示すブロック図である。It is a block diagram which shows the structure of the audio coding apparatus concerning Embodiment 4 of this invention. 本発明の実施の形態４にかかるオーディオ復号化装置の構成を示すブロック図である。It is a block diagram which shows the structure of the audio decoding apparatus concerning Embodiment 4 of this invention. ＣＤ上のフォーマットを示す図である。It is a figure which shows the format on CD. 従来のオーディオ復号化装置のフローチャートを示す図である。It is a figure which shows the flowchart of the conventional audio decoding apparatus.

Explanation of symbols

１０１符号化手段
１０２音量レベル出力手段
１０３補助データ出力手段
２０１音量レベル抽出手段
２０２復号化手段
２０３音量調節手段
６０１符号化手段
６０２カテゴリ分類手段
６０３補助データ出力手段
７０１カテゴリ抽出手段
７０２復号化手段
７０３オーディオ出力手段
１００１カテゴリ抽出手段
１００２復号化手段
１００３オーディオ出力手段
１１０１符号化手段
１１０２補助データ出力手段
１２０１帯域データ抽出手段
１２０２復号化手段 101 Coding means 102 Volume level output means 103 Auxiliary data output means 201 Volume level extraction means 202 Decoding means 203 Volume adjustment means 601 Encoding means 602 Category classification means 603 Auxiliary data output means 701 Category extraction means 702 Decoding means 703 Audio Output means 1001 Category extraction means 1002 Decoding means 1003 Audio output means 1101 Encoding means 1102 Auxiliary data output means 1201 Band data extraction means 1202 Decoding means

Claims

In an audio encoding device that receives audio data and outputs encoded data obtained by encoding the audio data together with auxiliary data related to the audio data,
Encoding means for encoding input audio data and outputting the encoded data, and outputting encoding related data relating to encoding of the audio data;
Volume level output means for obtaining and outputting a volume level for each predetermined period of the audio data;
Auxiliary data output means for creating auxiliary data including the volume level from the volume level output by the volume level output means and the encoding related data output by the encoding means,
An audio encoding device.

The audio encoding device according to claim 1, wherein
The predetermined period is one song.
An audio encoding device.

The audio encoding device according to claim 1, wherein
The volume level output means obtains the volume level based on a maximum value of the volume of audio data in the predetermined period and an average volume.
An audio encoding device.

An audio decoding device for decoding and outputting encoded data output from the audio encoding device according to claim 1,
Decoding means for decoding the encoded data output from the audio encoding device and outputting audio data;
Volume level extraction means for extracting the volume level from the auxiliary data output from the audio encoding device;
Volume adjustment means for adjusting and outputting the volume of the audio data output by the encoding means based on the volume level extracted by the volume level extraction means,
An audio decoding device characterized by that.

In an audio encoding device that receives audio data and outputs encoded data obtained by encoding the audio data together with auxiliary data related to the audio data,
Encoding means for encoding input audio data for each frame as a unit of encoding and outputting the encoded data, and outputting encoding related data relating to encoding of the audio data;
Category classification means for classifying the audio data into any one of a plurality of categories for each frame and outputting category information indicating in which category the frame is classified;
Auxiliary data output means for creating auxiliary data including the category information from the category information output by the category classification means and the encoding-related data output by the encoding means,
An audio encoding device.

The audio encoding device according to claim 5, wherein
The auxiliary data is
It includes information on a pair of a value indicating a category and the number of consecutive frames classified into the category.
An audio encoding device.

An audio decoding device for decoding and outputting encoded data output from the audio encoding device according to claim 5,
Decoding means for decoding the encoded data output from the audio encoding device and outputting audio data;
Category extraction means for extracting the category information from the auxiliary data output from the audio encoding device;
Category designating means for designating a category for controlling audio output among the plurality of categories;
Based on the category information extracted from the auxiliary data and the specified category output from the category specifying means, the decoded audio data is output-controlled for audio output of frames belonging to the specified category. Audio output means for performing and outputting,
An audio decoding device characterized by that.

The audio decoding device according to claim 7, wherein
The category designating unit designates a category that does not output audio among the plurality of categories,
The audio output means controls the audio data output by the decoding means not to output audio belonging to a category specified by the category specifying means;
An audio decoding device characterized by that.

An audio decoding device for decoding and outputting encoded data output from the audio encoding device according to claim 5,
Category extraction means for extracting the category information from the auxiliary data output from the audio encoding device;
Output prohibition category designation means for designating a category that does not output audio among the plurality of categories;
Based on the category information extracted from the auxiliary data and an output prohibition category output from the output prohibition category designating unit, frames belonging to the output prohibition category among the encoded data output from the audio encoding device Decoding means for decoding the encoded data excluding the encoded data and outputting audio data,
An audio decoding device characterized by that.

The audio decoding device according to claim 9, wherein
The output of audio data decoded by the decoding means is provided as the encoded data that is not decoded by the decoding means according to the output of the output prohibition category specifying means. Audio output means for controlling the period during which the data is output to be constant,
An audio decoding device characterized by that.

In an audio encoding device that receives audio data from a plurality of audio sources and outputs encoded data obtained by encoding the audio data together with auxiliary data related to the audio data,
Band data indicating in which audio band of the plurality of audio bands the audio data from the audio sound source is included in one or more audio sound sources of the plurality of audio sound sources, and encoding Encoding means for outputting the encoding-related data with the encoded data obtained by encoding the audio data from the plurality of audio sound sources;
Auxiliary data output means for creating auxiliary data including the band data from the band data output by the encoding means and the encoding-related data,
An audio encoding device.

The audio encoding device according to claim 11, wherein
The encoding means encodes audio data of a predetermined audio sound source after being limited so as to be within a predetermined audio band in the encoded data, and outputs the encoded data.
An audio encoding device.

An audio decoding device for decoding and outputting encoded data output from the audio encoding device according to claim 11,
Band data extraction means for extracting the band data from the auxiliary data output from the audio encoding device;
Sound source designating means for designating an audio sound source to be controlled for audio output among the one or more audio sound sources;
Based on the band data extracted from the auxiliary data and the designated sound source output from the sound source designating means, the encoded data of the band including the audio data of the designated sound source is limited in amplitude of the audio data. Decoding means for decoding and outputting,
An audio decoding device characterized by that.