JP2000347697A

JP2000347697A - Voice record regenerating device and record medium

Info

Publication number: JP2000347697A
Application number: JP11155536A
Authority: JP
Inventors: Shigeru Ota; 茂太田
Original assignee: Nippon Columbia Co Ltd
Current assignee: Nippon Columbia Co Ltd
Priority date: 1999-06-02
Filing date: 1999-06-02
Publication date: 2000-12-15

Abstract

PROBLEM TO BE SOLVED: To easily detect stored voice level even if the voice data of a sub-band is not decoded by detecting a voice signal or the voice level of voice data, and adding data showing detected voice level to an assistant data part of a frame format. SOLUTION: An encode/decode part 110 performs compression process of input voice data, and converts into the signal of specified frame format. A system microcomputer 103 controls this operation, and controls voice data to be compression/expansion processed to be temporarily stored and read out. When the encode/decode part 110 compresses and converts a voice signal, the system microcomputer 103 adds the data of the voice level of the voice signal to the assistant data part of the specified frame format, and further at reading out the signal, the data of the voice level added to the assistant data part of the specified frame format is detected to change the gain of electronic volume 108 according to the height.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、音声圧縮データを
記録再生する音声記録再生装置及び記録媒体に関する。The present invention relates to an audio recording / reproducing apparatus for recording / reproducing audio compressed data and a recording medium.

【０００２】[0002]

【従来の技術】聴覚心理特性を利用し、聴覚の感度が低
い細部の情報量を省略することで情報量を削減する音声
圧縮方法の代表的なものにＭＰＥＧ(Moving Picture Ex
pertsGroup)オーディオがある。ＭＰＥＧオーディオに
はＭＰＥＧ１、ＭＰＥＧ２、ＭＰＥＧ４の３種類の音声
圧縮処理方法が規格化され、さらに、ＭＰＥＧ１にはLa
yer1、Layer2、Layer3の３種類の音声圧縮のモードあ
る。2. Description of the Related Art MPEG (Moving Picture Exclusive) is a typical audio compression method for reducing the amount of information by using the psychoacoustic characteristics and omitting the amount of information with low sensitivity to hearing.
pertsGroup) There is audio. Three types of audio compression processing methods, MPEG1, MPEG2 and MPEG4, have been standardized for MPEG audio.
There are three types of audio compression modes: yer1, Layer2, and Layer3.

【０００３】ＭＰＥＧ１オーディオのLayer1の音声圧縮
処理方法は、３８４サンプルの音声データを一つの処理
単位として、入力する音声信号を異なる周波数帯域の３
２のサブバンド（以下、ＳＢという。）に分割して量子
化し、各サブバンドのサンプルデータの波形の最大振幅
が略１．０となるように正規化したときの比率であるス
ケールファクター（scale factor。以下、ＳＦとい
う。）と各ＳＢに適正にビットを割り当てるビットアロ
ケーションを用いて音声圧縮するものである。The MPEG1 audio layer 1 audio compression processing method uses 384 samples of audio data as one processing unit to convert an input audio signal into three different frequency bands.
2 sub-bands (hereinafter referred to as SBs), quantized, and normalized by a factor such that the maximum amplitude of the sample data waveform of each sub-band is approximately 1.0. (hereinafter, referred to as SF) and bit allocation that appropriately allocates bits to each SB.

【０００４】ＭＰＥＧ１オーディオのLayer2は、Layer1
の音声圧縮の処理に加え、３８４×３サンプルの音声デ
ータを一つの処理単位とし、複数のデータ転送速度に対
して各データ転送速度毎に準備された圧縮処理用のテー
ブルを用いてビット割り当てを行い高品質・高効率に音
声符号化を行う音声圧縮方法である。[0004] Layer 2 of MPEG1 audio is layer 1
In addition to the audio compression processing described above, audio data of 384 × 3 samples are used as one processing unit, and bit allocation is performed using a compression processing table prepared for each data transfer rate for a plurality of data transfer rates. This is a voice compression method that performs voice coding with high quality and high efficiency.

【０００５】ＭＰＥＧ２オーディオのLayer2は、基本的
にＭＰＥＧ１のステレオをマルチチャンネルに拡張した
もの、或いはサンプリング周波数を１／２にして遅い速
度でデータ伝送を行えるようにしたものであるが、音声
圧縮の方法はＭＰＥＧ１と同じ技術が用いられている。
したがって、ＭＰＥＧ２についての説明は省略する。[0005] Layer 2 of MPEG2 audio is basically an extension of the stereo of MPEG1 to multi-channels, or a structure in which the sampling frequency is reduced to 1/2 so that data can be transmitted at a low speed. The method uses the same technology as MPEG1.
Therefore, description of MPEG2 is omitted.

【０００６】ＭＰＥＧ１オーディオのLayer3は、Layer2
の音声圧縮方法の処理に、エリアシングを起こしにくい
周波数分割方法である変形離散コサイン変換と圧縮処理
の過程で出てくるパラメータの出現確率に偏りがあるこ
とに着目した符号長を可変とするエントロピー(ハフマ
ン)符号化等の処理を用いることにより、更に高効率の
符号化を行う音声圧縮方法である。ＭＰＥＧオーディオ
の概要は最新ＭＰＥＧ教科書(１９９４年８月初版、
(株)アスキー出版、変形離散コサイン変換はP.176、エ
ントロピー符号化はP.17)に記載されている。[0006] Layer 3 of MPEG1 audio is Layer 2
In the processing of the audio compression method, the variable discrete cosine transform, which is a frequency division method that is unlikely to cause aliasing, and the entropy that makes the code length variable, focusing on the bias in the appearance probabilities of parameters appearing in the process of compression processing (Huffman) This is a speech compression method that performs encoding with higher efficiency by using processing such as encoding. An overview of MPEG audio is available in the latest MPEG textbook (first published in August 1994,
ASCII Modified Discrete Cosine Transform is described on page 176, and entropy coding is described on page 17).

【０００７】上述した圧縮データを再生するとき、記録
されている音声データの音声レベルの大きさに合わせて
出力レベルの大きさを自動的に変化させるようにした音
声記録再生装置がある。この場合、圧縮した音声データ
を伸張する処理を行ったのち、その音声レベルを検出す
るという手段を用いる。There is an audio recording / reproducing apparatus which automatically changes the output level in accordance with the audio level of recorded audio data when reproducing the above-mentioned compressed data. In this case, a means for performing processing for expanding compressed audio data and then detecting the audio level is used.

【０００８】[0008]

【発明が解決しようとする課題】しかしながら、音声レ
ベルを検知するために音声データの再生を行うための復
号処理を行う必要があり、音声レベルを検知するまでに
複雑な処理を要することになる。本発明の目的は、ＳＢ
の音声データを復号しなくても、記録されている音声レ
ベルを容易に検出することができる音声記録再生装置及
び記録媒体を得ることである。However, it is necessary to perform a decoding process for reproducing the audio data in order to detect the audio level, and a complicated process is required until the audio level is detected. The object of the present invention is
It is an object of the present invention to provide an audio recording / reproducing apparatus and a recording medium capable of easily detecting a recorded audio level without decoding the audio data.

【０００９】[0009]

【課題を解決するための手段】本発明は、音声信号を圧
縮伸張して記録再生する音声記録再生装置において、音
声信号を異なる音声周波数帯域のサブバンドに分割し、
量子化して音声データとする帯域分割手段と、マスキン
グ閾値を算出するマスキング閾値算出手段と、前記音声
データのレベルと前記マスキング閾値とを比較するレベ
ル比較手段と、前記レベル比較手段の比較結果に基づい
て前記サブバンドにビットを割り当てるビットアロケー
ション手段と、前記ビットアロケーション手段が前記サ
ブバンドに割り当てたビット数の音声データを再量子化
する再量子化手段と、音声信号または音声データの音声
レベルを検出する音声レベル検出手段と、前記音声レベ
ル検出手段が検出した音声レベルを示すデータをフレー
ムフォーマットの補助データ部に付加するよう信号変換
する信号変換手段と、前記フレームフォーマットの信号
の再生に際し、前記音声レベルを示すデータにもとづき
音声レベル制御部の利得を制御する制御手段を備える音
声記録再生装置である。According to the present invention, there is provided an audio recording / reproducing apparatus for recording / reproducing an audio signal by compressing / expanding the audio signal, wherein the audio signal is divided into sub-bands of different audio frequency bands,
Band dividing means for quantizing to audio data, masking threshold calculating means for calculating a masking threshold, level comparing means for comparing the level of the audio data with the masking threshold, and a comparison result of the level comparing means Bit allocation means for allocating bits to the sub-bands, re-quantization means for re-quantizing voice data of the number of bits allocated to the sub-bands by the bit allocation means, and detecting a voice level of a voice signal or voice data. Audio level detecting means for converting the audio level detected by the audio level detecting means into a supplementary data portion of a frame format. Audio level control unit based on data indicating the level An audio recording and reproducing apparatus comprises control means for controlling the gain.

【００１０】また、本発明は、音声信号を圧縮伸張して
記録再生する音声記録再生装置において、前記ビットア
ロケーション手段は、前記サブバンドのサンプルデータ
のスケールファクターを演算し、前記音声レベル検出部
は、前記スケールファクターの最大値を検出し、前記制
御手段は、前記スケールファクターの最大値を前記補助
データ部に付加する音声記録再生装置である。The present invention also relates to an audio recording / reproducing apparatus for recording / reproducing an audio signal by compressing / expanding the audio signal, wherein the bit allocation means calculates a scale factor of the sample data of the sub-band, and the audio level detecting section comprises: An audio recording / reproducing apparatus for detecting a maximum value of the scale factor and adding the maximum value of the scale factor to the auxiliary data section.

【００１１】また、本発明は、音声信号を圧縮伸張して
記録再生する音声記録再生装置において、前記音声レベ
ル検出部は、前記音声記録再生装置に入力する音声信号
または音声データの音声レベルを検出し、前記制御手段
は、前記声レベルを前記補助データ部に付加する音声記
録再生装置である。The present invention also relates to an audio recording and reproducing apparatus for recording and reproducing an audio signal by compressing and expanding the audio signal, wherein the audio level detecting section detects an audio level of an audio signal or audio data input to the audio recording and reproducing apparatus. The control means is a voice recording / reproducing apparatus for adding the voice level to the auxiliary data section.

【００１２】また、本発明は、スケールファクターのデ
ータと音声信号または音声データを圧縮したオーディオ
データを配置したフレームフォーマットの信号を記録す
る記録媒体において、前記フレームフォーマットの補助
データ部に前記スケールファクターの最大値または前記
スケールファクターの平均値を配置して記録した記録媒
体である。The present invention also relates to a recording medium for recording a signal in a frame format in which data of a scale factor and an audio signal or audio data obtained by compressing audio data are recorded. This is a recording medium in which a maximum value or an average value of the scale factors is arranged and recorded.

【００１３】[0013]

BEST MODE FOR CARRYING OUT THE INVENTION

【発明の実施の形態】本発明の実施例の音声記録再生装
置について説明する。図１は本発明の一実施例の音声再
生装置の構成を示すブロック図である。DESCRIPTION OF THE PREFERRED EMBODIMENTS An audio recording / reproducing apparatus according to an embodiment of the present invention will be described. FIG. 1 is a block diagram showing a configuration of an audio reproducing apparatus according to one embodiment of the present invention.

【００１４】記録媒体１０１は、音声圧縮した音声デー
タをＭＰＥＧオーディオ規格のフレームフォーマットの
信号で記録する記録媒体である。本実施例では記録媒体
として、ＩＣメモリを使用する。The recording medium 101 is a recording medium for recording audio data that has been subjected to audio compression in the form of a signal in the frame format of the MPEG audio standard. In this embodiment, an IC memory is used as a recording medium.

【００１５】インターフェース部１０２は、記録媒体１
０１から読み取った音声データをエンコード／デコード
部１１０に転送し、エンコード／デコード部１１０で伸
張した音声データを音声再生回路部１１２に転送する。
また、外部から入力されるディジタルオーディオデータ
をエンコード／デコード部１１０に転送し音声圧縮し、
エンコード／デコード部１１０で所定のフレームフォー
マットの信号に変換し記録媒体１０１に記録する。The interface unit 102 includes the recording medium 1
Then, the audio data read from “01” is transferred to the encoding / decoding unit 110, and the audio data expanded by the encoding / decoding unit 110 is transferred to the audio reproduction circuit unit 112.
Further, the digital audio data input from the outside is transferred to the encoding / decoding unit 110, and the audio is compressed.
The signal is converted into a signal of a predetermined frame format by the encoding / decoding unit 110 and recorded on the recording medium 101.

【００１６】音声再生回路部１１２は、デジタルフィル
ター（Ｄ／Ｆ、Digital Filter）１０６、デジタルアナ
ログ変換器（Ｄ／Ａ、Digital Analog Converter)１０
７、電子ボリューム１０８及びバッファ回路１０９を備
える。The audio reproducing circuit unit 112 includes a digital filter (D / F, Digital Filter) 106 and a digital / analog converter (D / A, Digital Analog Converter) 10.
7, an electronic volume 108 and a buffer circuit 109 are provided.

【００１７】エンコード／デコード部１１０は、インタ
ーフェース部１０２から入力される入力音声データを後
述するＭＰＥＧ１の圧縮処理を行い、所定のフレームフ
ォーマットの信号に変換する。また、所定のフレームフ
ォーマットの信号に変換された圧縮音声データを伸張処
理する。The encoding / decoding unit 110 performs compression processing of MPEG1 described later on the input audio data input from the interface unit 102, and converts it into a signal of a predetermined frame format. In addition, the compressed audio data converted into a signal of a predetermined frame format is expanded.

【００１８】システムマイコン１０３は、エンコード／
デコード部１１０の動作を制御するとともに、圧縮伸張
処理をする音声データをメモリ１１１に一時的に記憶及
び読み出す等の制御を行う。また、システムマイコン１
０３は、エンコード／デコード部１１０が音声信号を圧
縮して所定のフレームフォーマットの信号に変換する際
に、音声信号の音声レベルのデータを所定のフレームフ
ォーマットの補助データ部に付加し、さらに所定のフレ
ームフォーマットの信号を読み出すときに補助データ部
に付加した音声レベルのデータを検出してその大きさに
基づいて電子ボリューム１０８の利得を変化させる制御
を行う。また、システムマイコン１０３は、操作部１０
４から送信される信号を受信して音声記録再生装置の動
作を制御し、表示部１０５へ表示する表示データを送信
する等の音声記録再生装置全体の動作の制御を行う。The system microcomputer 103 performs encoding /
In addition to controlling the operation of the decoding unit 110, it also performs control such as temporarily storing and reading audio data for compression / expansion processing in the memory 111. Also, the system microcomputer 1
03, when the encode / decode unit 110 compresses the audio signal and converts it into a signal of a predetermined frame format, the data of the audio level of the audio signal is added to the auxiliary data part of the predetermined frame format, When reading out the signal of the frame format, the data of the audio level added to the auxiliary data portion is detected, and the control of changing the gain of the electronic volume 108 based on the magnitude is performed. Further, the system microcomputer 103 includes the operation unit 10
4 to control the operation of the audio recording / reproducing apparatus, and control the operation of the entire audio recording / reproducing apparatus, such as transmitting display data to be displayed on the display unit 105.

【００１９】操作部１０４は、音声記録再生装置の録音
再生の開始及び停止等を指示する指示釦を備える。表示
部１０５は、操作部１０４の操作状態または記録再生し
ている音声のレベル等を表示する。The operation unit 104 has an instruction button for instructing start and stop of recording and reproduction of the audio recording and reproducing apparatus. The display unit 105 displays the operation state of the operation unit 104, the level of the sound being recorded and reproduced, and the like.

【００２０】図２は、本発明の第１の実施例である音声
記録再生装置のエンコード／デコード部の構成を説明す
る図である。FIG. 2 is a diagram for explaining the configuration of the encoding / decoding section of the audio recording / reproducing apparatus according to the first embodiment of the present invention.

【００２１】帯域分割部２０１は、複数のフィルタを用
いて音声周波数帯域を複数の周波数帯域に帯域分割する
フィルタバンクを備え、入力音声データを量子化する信
号処理回路である。入力音声データは３２の異なるＳＢ
に分割されて量子化される。The band dividing unit 201 is a signal processing circuit that includes a filter bank for dividing a voice frequency band into a plurality of frequency bands using a plurality of filters, and quantizes input voice data. Input audio data consists of 32 different SBs
And quantized.

【００２２】マスキング閾値算出部２０２は、高速フー
リエ変換（ＦＦＴ、Fast Fourier Transform）処理によ
り各ＳＢのパワーレベルを算出し、聴覚心理特性を利用
しマスキングの閾値を求める。マスキングとは、例え
ば、静寂な環境ではせせらぎの音を聞き取れるが、嵐の
中では聞き取れないことが有るというように二つの音が
同時に発生している場合、小さい方の音が大きい方の音
によって聞こえなくなる状態をいう。また、マスキング
閾値とは、あるＳＢにおける音声信号が隣接する音声に
よってマスキングされる最大音声レベルである。The masking threshold calculation unit 202 calculates the power level of each SB by fast Fourier transform (FFT, Fast Fourier Transform) processing, and obtains a masking threshold using psychoacoustic characteristics. Masking means, for example, that in a quiet environment, the sound of a babbling can be heard, but in a storm it may not be heard. A state in which you cannot hear. The masking threshold is a maximum audio level at which an audio signal in a certain SB is masked by adjacent audio.

【００２３】ビットアロケーション／スケールファクタ
演算部２０３は、マスキング閾値算出部２０２から出力
されるマスキング閾値と各ＳＢの音声データのレベルか
ら判断して各ＳＢに適正なビット数の配分を行う。再量
子化部２０４は、各ＳＢに適正なビット数の配分を行っ
た後、配分されたビット数で音声データの再量子化を行
い圧縮した音声データを出力する。また、ビットアロケ
ーション／スケールファクタ演算部２０３は、ＳＢ内の
サンプルデータの波形を最大振幅が略１．０となるよう
に正規化したときのサンプルデータの波形の最大振幅値
と再量子化し圧縮した音声データが表す音声レベルの最
大振幅値との比率（圧縮データの倍率）を示すスケール
ファクタ（ＳＦ、scale factor）を演算して出力する。
音声データは、再量子化されたデータと演算したＳＦの
データによって正確な音声信号として表わせられる。The bit allocation / scale factor calculation unit 203 determines the masking threshold value output from the masking threshold value calculation unit 202 and the level of the audio data of each SB, and allocates an appropriate number of bits to each SB. After allocating an appropriate number of bits to each SB, the requantization unit 204 re-quantizes the audio data with the allocated number of bits and outputs compressed audio data. Further, the bit allocation / scale factor calculation unit 203 requantizes and compresses the waveform of the sample data in the SB to the maximum amplitude value of the sample data when the waveform is normalized so that the maximum amplitude becomes approximately 1.0. A scale factor (SF) indicating the ratio of the audio level represented by the audio data to the maximum amplitude value (magnification of the compressed data) is calculated and output.
The audio data is represented as an accurate audio signal by the requantized data and the calculated SF data.

【００２４】フォーマティング部２０５は、同期信号、
ＭＰＥＧオーディオの各種モードの識別子を含んだヘッ
ダ情報、ビットアロケーション／スケールファクタ演算
部２０３で確定した各ＳＢに割り当てられたビット数、
ＳＦ及び再量子化部１０４で再量子化した音声データを
ＭＰＥＧ１オーディオ規格のフレームフォーマットの信
号に変換して出力する。The formatting unit 205 includes a synchronization signal,
Header information including identifiers of various modes of MPEG audio, the number of bits allocated to each SB determined by the bit allocation / scale factor calculation unit 203,
The audio data re-quantized by the SF and re-quantization unit 104 is converted into a signal of a frame format of the MPEG1 audio standard and output.

【００２５】レベル比較部２０６は、各ＳＢに割り当て
られたビット数で音声データを再量子化したときに生じ
る量子化ノイズのレベルとマスキング閾値とを比較し、
さらに、各ＳＢにおける信号レベルとマスキング閾値の
レベルの差を予め決められている基準値と比較する。比
較した結果はビットアロケーション／スケールファクタ
演算部２０３に送られ、ビットアロケーション／スケー
ルファクタ演算部２０３は、比較した結果に基づいて各
ＳＢに割り当てるビット数を決める。The level comparing section 206 compares the level of quantization noise generated when the audio data is requantized with the number of bits allocated to each SB with a masking threshold,
Further, the difference between the signal level in each SB and the level of the masking threshold is compared with a predetermined reference value. The comparison result is sent to the bit allocation / scale factor calculation unit 203, and the bit allocation / scale factor calculation unit 203 determines the number of bits to be allocated to each SB based on the comparison result.

【００２６】インターフェース部１０２から入力される
入力音声データは、帯域分割部２０１に入力されて３２
に分割されたＳＢ毎の音声データに変換される。また同
時に、インターフェース部１０２から入力される入力音
声データは、マスキング閾値算出部２０２に入力されＦ
ＦＴ(Fast Fourier Transform)処理されて、各ＳＢ毎に
パワーレベルが算出される。各ＳＢ毎のパワーレベルが
算出されることにより、マスキング閾値算出部２０２は
マスキング効果による各ＳＢのマスキング閾値を出力す
る。The input audio data input from the interface unit 102 is input to the
Is converted into audio data for each SB divided into. At the same time, the input voice data input from the interface unit 102 is input to the
FT (Fast Fourier Transform) processing is performed, and a power level is calculated for each SB. By calculating the power level for each SB, the masking threshold calculation unit 202 outputs a masking threshold of each SB due to the masking effect.

【００２７】ビットアロケーション／スケールファクタ
演算部２０３において、マスキング閾値は帯域分割部２
０１の出力である各ＳＢ内の音声データの信号レベルと
比較され、予め決められた割り当て可能なビット総数に
収まるように各ＳＢに割り当てるビット数が決定され
る。ビット数を各ＳＢに割り当てる処理手順は、はじめ
に３２のＳＢの中からダイナミックレンジの最も大きい
ＳＢを選択する。各ＳＢをスキャンして量子化ノイズ対
マスキング閾値比（ＮＭＲ）の最も大きいＳＢを選択
し、そのＳＢにビット数の一部を割り当てる。ＳＢに割
り当てられたビット数により量子化ノイズが変化してＮ
ＭＲが変化するので、再度ＮＭＲを計算した後に引き続
きＳＢをスキャンしＮＭＲの最も大きいＳＢを選択し、
そのＳＢにビット数の一部を割り当てて再度ＮＭＲを計
算するという処理を繰り返し続けて割り当て可能なビッ
ト数が全て割り当てられるまで繰り返し行う。その後、
各ＳＢ毎に割り当てられたビット数にしたがって、再量
子化部２０４で各ＳＢに割り当てられた音声データを再
量子化し圧縮した音声データとする。In the bit allocation / scale factor calculation unit 203, the masking threshold is
01 is compared with the signal level of the audio data in each SB, which is the output of 01, and the number of bits to be assigned to each SB is determined so as to be within a predetermined total number of assignable bits. In the processing procedure for assigning the number of bits to each SB, first, the SB having the largest dynamic range is selected from the 32 SBs. Each SB is scanned to select the SB having the largest quantization noise to masking threshold ratio (NMR), and a portion of the number of bits is assigned to the SB. The quantization noise changes according to the number of bits allocated to the SB and N
Since the MR changes, the SB is scanned again after calculating the NMR again, and the SB having the largest NMR is selected.
The process of allocating a part of the number of bits to the SB and calculating the NMR again is repeated until the allocatable bits are all allocated. afterwards,
According to the number of bits assigned to each SB, the requantization unit 204 re-quantizes and compresses the audio data assigned to each SB into audio data.

【００２８】フォーマッティング部２０５でＭＰＥＧ１
オーディオ規格で決められたフレームフォーマットの信
号となるようにスケールファクタ（ＳＦ）や音声データ
等のデータを配置して出力する。In the formatting unit 205, MPEG1
Data such as a scale factor (SF) and audio data are arranged and output so as to be a signal of a frame format determined by an audio standard.

【００２９】図４は、音声を圧縮伸張して記録再生する
音声記録再生装置において、入力音声データが異なるＳ
Ｂに分割されてビットが割り当てられた音声データを説
明する図である。FIG. 4 shows an audio recording / reproducing apparatus for recording / reproducing a voice by compressing / expanding the voice, and in which the input voice data is different.
FIG. 9 is a diagram illustrating audio data divided into B and assigned bits.

【００３０】図４（ａ）の縦軸は、各ＳＢにおける入力
音声データの音声レベルと聴覚心理特性を利用して得ら
れたマスキングの閾値（斜線部）のレベルを示す。また
横軸は、各ＳＢの番号を示しており、ＳＢ０側が低い周
波数でＳＢ３１側が高い周波数である。図４（ａ）は、
入力音声データの音声レベルがマスキング閾値より大き
いほどダイナミックレンジが大きく、多くのビット数を
必要とすることを示している。The vertical axis of FIG. 4 (a) shows the level of the masking threshold (hatched portion) obtained by using the audio level of the input audio data and the psychoacoustic characteristics in each SB. The horizontal axis indicates the number of each SB, with the SB0 side having a low frequency and the SB31 side having a high frequency. FIG. 4 (a)
The dynamic range is larger as the audio level of the input audio data is larger than the masking threshold, indicating that more bits are required.

【００３１】図４（ｂ）は、ビットアロケーション／ス
ケールファクタ演算部１０３によって各ＳＢ内に割り当
てられたビット数で音声データを再量子化したときの状
態を示す図である。縦軸は、各ＳＢに割り当てられる音
声データのビット数であり、再量子化部２０４で各ＳＢ
の音声データを再量子化するときの量子化ビット数とな
る。横軸はＳＢの番号を示し、ＳＢ０側が低い周波数で
ＳＢ３１側が高い周波数である。図４（ａ）、（ｂ）で
示すように入力音声データの音声レベルがマスキング閾
値より大きいＳＢほどビット数が多く割り当てられる。FIG. 4B is a diagram showing a state when the audio data is requantized by the bit number allocated in each SB by the bit allocation / scale factor calculation unit 103. The vertical axis represents the number of bits of audio data allocated to each SB.
Is the number of quantization bits when re-quantizing the audio data. The horizontal axis indicates the SB number, with the SB0 side being a low frequency and the SB31 side being a high frequency. As shown in FIGS. 4A and 4B, as the SB of the input audio data whose audio level is higher than the masking threshold, the number of bits is allocated more.

【００３２】ＳＢ２６、ＳＢ２９についてはマスキング
閾値より音声レベルが低いので、ビットが割り当てられ
ていない。また、ＳＢ２７はマスキング閾値より音声レ
ベルが高いが、音声レベルがマスキング閾値より大きい
ＳＢに優先的にビットを割り当てるアルゴリズムのた
め、ビットが割り当てられていない。No bits are assigned to SB26 and SB29 because the audio level is lower than the masking threshold. Further, although the sound level of the SB 27 is higher than the masking threshold, no bit is allocated to the SB 27 because the algorithm preferentially allocates bits to SBs whose voice level is higher than the masking threshold.

【００３３】次に、ＭＰＥＧ１オーディオ規格のフレー
ムフォーマットについて説明する。図５は、本実施例の
音声記録再生装置に使用するＭＰＥＧ１オーディオのフ
レームフォーマットにおける各種データの配置を説明す
る模式図である。Next, the frame format of the MPEG1 audio standard will be described. FIG. 5 is a schematic diagram illustrating the arrangement of various data in the MPEG1 audio frame format used in the audio recording / reproducing apparatus of the present embodiment.

【００３４】ヘッダー(header) 部には、フレームの開
始を示す同期信号（sync word:1111 1111 1111)が配置
される。エラーチェック(error check) 部には、エラー
検出情報が配置される。オーディオデータ(audio data)
部には、オーディオサンプルのデータとして、３２のＳ
Ｂの音声データを符号化する順番を表すアロケーション
(allocation)のデータ、ＳＢ内のサンプルデータの波形
を最大振幅が略１．０となるように正規化したときの比
率（圧縮データの倍率）を表すスケールファクタ(scale
factor)のデータ、及び音声データを分割したオーディ
オサンプルデータ（圧縮データ）が配置される。マルチ
チャンネル拡張(mc extension）部は、ＭＰＥＧ１をＭ
ＰＥＧ２マルチチャンネルとするための拡張部分であ
る。補助データ (ancillary data) 部は、補助データと
して使用される部分である。A synchronization signal (sync word: 1111 1111 1111) indicating the start of a frame is arranged in the header section. Error detection information is arranged in an error check section. Audio data
In the section, 32 S
Allocation indicating the order of encoding the audio data of B
(allocation) data, the scale factor (scale of the compressed data) representing the ratio when the waveform of the sample data in the SB is normalized so that the maximum amplitude becomes approximately 1.0.
factor) and audio sample data (compressed data) obtained by dividing the audio data. The multi-channel extension (mc extension) unit converts MPEG1 to M
This is an extension part for making a PEG2 multi-channel. The ancillary data part is the part used as auxiliary data.

【００３５】次に、本発明の一実施例の音声記録再生装
置における音声レベルを検出する動作について説明す
る。図６は、本実施例の音声記録再生装置における音声
レベルを検出する動作を説明するフローチャートであ
る。Next, an operation of detecting an audio level in the audio recording / reproducing apparatus according to one embodiment of the present invention will be described. FIG. 6 is a flowchart illustrating an operation of detecting an audio level in the audio recording and reproducing apparatus according to the present embodiment.

【００３６】はじめに、最初に読みとるフレームデータ
のフレーム番号を１に設定して、圧縮音声データの最初
のフレームデータを読みとりメモリ１１１に転送し記憶
する。（ＳＴ６０１）、（ＳＴ６０２）First, the frame number of the frame data to be read first is set to 1, and the first frame data of the compressed audio data is read and transferred to the memory 111 for storage. (ST601), (ST602)

【００３７】次に、スケールファクタ（ＳＦ）の最大値
を記憶したメモリのデータをクリアする。（ＳＴ６０
３）システムマイコン１０３は、メモリ１１１に転送された
フレームのデータから各ＳＢのスケールファクタ（Ｓ
Ｆ）のデータを低い周波数側のＳＢから順次読み出す。
ＭＰＥＧ１オーディオLayer1のモノラルモード及びサン
プリング周波数（ｆｓ）=３２ＫＨＺとした場合、1フレ
ーム当たり３２個のＳＦのデータがある。（ＳＴ６０
４）、（ＳＴ６０５）Next, the data in the memory storing the maximum value of the scale factor (SF) is cleared. (ST60
3) The system microcomputer 103 calculates the scale factor (S) of each SB from the frame data transferred to the memory 111.
The data of F) is sequentially read from the SB on the lower frequency side.
When the monaural mode of the MPEG1 audio layer 1 and the sampling frequency (fs) = 32 KHZ, there are 32 SF data per frame. (ST60
4), (ST605)

【００３８】ＭＰＥＧ１オーディオの場合、ＳＦのデー
タは＋６dBから−１１８dBまで２dBステップとなってい
る。読み出した各ＳＢのＳＦのデータを順次比較し、メ
モリに記憶したＳＦの最大値を更新していくことでその
フレームにおけるＳＦの最大値を求めることができる。
求めた最終値をそのフレームの最大レベルのデータとす
る。（ＳＴ６０６）、（ＳＴ６０７）、（ＳＴ６０８）In the case of MPEG1 audio, SF data is in 2 dB steps from +6 dB to -118 dB. By sequentially comparing the read SF data of each SB and updating the maximum value of the SF stored in the memory, the maximum value of the SF in the frame can be obtained.
The obtained final value is used as the maximum level data of the frame. (ST606), (ST607), (ST608)

【００３９】最初のフレームについて最大レベルのＳＦ
のデータが確定したら、読みとりフレームのフレーム番
号に１を加えて次のフレームのデータをメモリ１１１に
読み取り、同じ手順でそのフレームの最大レベルのＳＦ
のデータとして求めるという動作を繰り返す。（ＳＴ６
０９）、（ＳＴ６１０）The maximum level of SF for the first frame
Is determined, 1 is added to the frame number of the read frame, the data of the next frame is read into the memory 111, and the SF of the maximum level of that frame is read in the same procedure.
Is repeated. (ST6
09), (ST610)

【００４０】以上の動作手順で示すように、圧縮音声デ
ータを伸張処理することなく、ＳＦのデータのみを読み
取ることで各々のフレームにおける最大レベルを検出す
ることができる。As shown in the above operation procedure, the maximum level in each frame can be detected by reading only the SF data without expanding the compressed audio data.

【００４１】図７は、本実施例の音声記録再生装置にお
いて、再生時に自動的に電子ボリュームの利得を制御す
る動作を説明するフローチャートである。FIG. 7 is a flowchart for explaining the operation of automatically controlling the gain of the electronic volume at the time of reproduction in the audio recording / reproducing apparatus of this embodiment.

【００４２】はじめに、プログラム曲の音声レベルを検
出する指示を操作部１０４に備えられた指示釦で行う。
（ＳＴ７０１）First, an instruction for detecting the audio level of the program tune is performed using an instruction button provided on the operation unit 104.
(ST701)

【００４３】システムマイコン１０３は、操作部１０４
の指示釦の指示によりプログラム曲内の１番目のフレー
ムから順に各フレームのＳＦの最大値のデータである
（ＳＦ）frame.max.ｎを検索する。ここで、ｎは１〜Ｎ
であり、Ｎは１曲のプログラムのフレーム数である。
（ＳＴ７０２）The system microcomputer 103 includes an operation unit 104
(SF) frame.max.n, which is data of the maximum value of SF of each frame, is searched in order from the first frame in the program music according to the instruction of the instruction button. Here, n is 1 to N
, And N is the number of frames of one music program.
(ST702)

【００４４】プログラム曲内の各フレームのＳＦの最大
値のデータ（ＳＦ）frame.max.ｎの平均値（ＳＦ）prog
ram.av.を計算する。（ＳＦ）program.av.は次の式で求
められる。（ＳＦ）program.av.＝Σ（ＳＦ）frame.max.ｎ／Ｎ、
（ｎ＝１〜Ｎ）、Ｎは１曲のプログラムのフレーム数
（ＳＴ７０３）Average value (SF) prog of maximum value data (SF) frame.max.n of SF of each frame in the program music
Calculate ram.av. (SF) program.av. Is obtained by the following equation. (SF) program.av. = Σ (SF) frame.max.n / N,
(N = 1 to N), where N is the number of frames of one program (ST703)

【００４５】ＳＴ７０３で求めた平均値（ＳＦ）progra
m.av.を記録媒体内のプログラム曲を管理する記録情報
部に記録する。（ＳＴ７０４）The average value (SF) progra obtained in ST703
m.av. is recorded in the recording information section for managing the program music in the recording medium. (ST704)

【００４６】操作部１０４に備えられた再生開始釦を操
作すると、システムマイコン１０３は、記録媒体内のプ
ログラム曲を管理する記録情報部に記録した平均値を読
み取り、その値に基づいて電子ボリューム１０８の利得
を自動的に変えるよう制御する。（ＳＴ７０５）、（Ｓ
Ｔ７０６）When the reproduction start button provided on the operation unit 104 is operated, the system microcomputer 103 reads the average value recorded in the recording information section for managing the program music in the recording medium, and based on the value, the electronic volume 108 Is controlled to automatically change the gain. (ST705), (S
T706)

【００４７】電子ボリューム１０８の利得が変えられた
後、音声データは復調されて音声回路部１１２に入力さ
れアナログ信号として出力される。（ＳＴ７０７）After the gain of the electronic volume 108 is changed, the audio data is demodulated, input to the audio circuit unit 112, and output as an analog signal. (ST707)

【００４８】図８は、本実施例の他の音声記録再生装置
において、フレームフォーマット信号の補助データ部の
データを使用しない場合の電子ボリュームの利得を制御
する動作を説明するフローチャートである。FIG. 8 is a flow chart for explaining the operation of controlling the gain of the electronic volume when the data of the auxiliary data part of the frame format signal is not used in another audio recording / reproducing apparatus of this embodiment.

【００４９】はじめに、プログラム曲の音声レベルを検
出する指示を操作部１０４に備えられた指示釦で行う。
（ＳＴ８０１）First, an instruction for detecting the audio level of the program music is made by using an instruction button provided on the operation unit 104.
(ST801)

【００５０】各フレームごとに３２のＳＢに与えられた
ＳＦのデータ（ＳＦ）ｉを読み出し、読み出した（Ｓ
Ｆ）ｉ（ｉ＝０〜３１）の平均値（ＳＦ）frame.av.nを
求める。（ＳＦ）frame.av.は次の式で表せられる。（ＳＦ）frame.av.n＝Σ（ＳＦ）ｉ／３２、（ｉ＝０〜
３１、ｎ＝ｎ）求めた（ＳＦ）frame.av.nの値を各フレームの平均音声
レベルとして内部メモリに記憶する。（ＳＴ８０２）The data (SF) i of the SF given to the 32 SBs for each frame is read and read (S
F) Find an average value (SF) frame.av.n of i (i = 0 to 31). (SF) frame.av. Is represented by the following equation. (SF) frame.av.n = Σ (SF) i / 32, (i = 0 to
31, n = n) The obtained value of (SF) frame.av.n is stored in the internal memory as the average audio level of each frame. (ST802)

【００５１】ＳＴ８０２で求めた各フレームの平均音声
レベル（ＳＦ）frame.av.n（ｎ＝１〜Ｎ）を用いてプロ
グラム曲の全体にわたっての平均値（ＳＦ）program.a
v.を計算する。（ＳＦ）program.av.は次の式で求めら
れる。（ＳＦ）program.av.＝Σ（ＳＦ）frame.av.ｎ／Ｎ、
（ｎ＝１〜Ｎ）、Ｎは１曲のプログラムのフレーム数（ＳＴ８０３）Using the average audio level (SF) frame.av.n (n = 1 to N) of each frame obtained in ST802, the average value (SF) program.a over the entire program music piece
Calculate v. (SF) program.av. Is obtained by the following equation. (SF) program.av. = Σ (SF) frame.av.n / N,
(N = 1 to N), where N is the number of frames of one tune program (ST803)

【００５２】ＳＴ８０３で求めた平均値のデータ（Ｓ
Ｆ）program.av.を記録媒体内のプログラム曲を管理す
る記録情報部に記録する。（ＳＴ８０４）The average value data (S
F) Program.av. Is recorded in the recording information section for managing the program music in the recording medium. (ST804)

【００５３】操作部１０４に備えられた再生開始釦を操
作すると、システムマイコン１０３は、記録媒体内の記
録情報部に記録した平均値のデータ（ＳＦ）program.a
v.を読み取り、その値に基づいて電子ボリューム１０８
の利得を自動的に変えるよう制御する。（ＳＴ８７０
５）、（ＳＴ７０６）When the reproduction start button provided on the operation unit 104 is operated, the system microcomputer 103 causes the average value data (SF) program.a recorded in the recording information section in the recording medium.
v. and read the electronic volume 108 based on the value.
Is controlled to automatically change the gain. (ST870
5), (ST706)

【００５４】電子ボリューム１０８の利得が変えられた
後、音声データは復調されて音声回路部１１２に入力さ
れアナログ信号として出力される。（ＳＴ８０７）After the gain of the electronic volume 108 is changed, the audio data is demodulated, input to the audio circuit unit 112, and output as an analog signal. (ST807)

【００５５】各プログラム曲の平均音声レベルの情報又
は最初のプログラム曲の音声レベルを基準とした各プロ
グラム曲の平均音声レベルの偏差の情報をプログラム曲
を管理する記録情報として記録媒体に記録することによ
り、再生時に圧縮音声データをデコードし音声レベルを
確認するという動作を行わなくてもプログラム曲の音声
レベルを知ることができる。プログラム曲の再生を指示
したとき、それぞれのプログラム曲の記録情報である平
均音声レベルのデータを読み取ることにより、電子ボリ
ューム３０８の利得を調整して聴感上同じ程度のレベル
で連続して再生できるようになる。Information on the average audio level of each program music or information on the deviation of the average audio level of each program music based on the audio level of the first program music is recorded on a recording medium as recording information for managing the program music. Thereby, the audio level of the program music can be known without performing the operation of decoding the compressed audio data and checking the audio level at the time of reproduction. When the reproduction of the program music is instructed, by reading the data of the average audio level, which is the recording information of each program music, the gain of the electronic volume 308 is adjusted so that the reproduction can be performed continuously at the same level in the sense of hearing. become.

【００５６】次に、本発明の第２の実施例の音声記録再
生装置について説明する。図３は、本発明の第２の実施
例の音声記録再生装置のエンコードデコード部の構成を
説明する図である。Next, an audio recording / reproducing apparatus according to a second embodiment of the present invention will be described. FIG. 3 is a diagram for explaining the configuration of the encode / decode section of the audio recording / reproducing apparatus according to the second embodiment of the present invention.

【００５７】図３に示される音声記録再生装置の構成
は、図２に示される音声記録再生装置の構成のスケール
ファクタ最大値検出部２０７を音声レベル検出部２０７
ａに換えたものであり、その他の構成は同じである。音
声レベル検出部２０７ａは、音声圧縮処理する前の入力
音声データの音声レベルを検出する。The configuration of the audio recording / reproducing apparatus shown in FIG. 3 is different from the configuration of the audio recording / reproducing apparatus shown in FIG.
a, and the other configuration is the same. The audio level detection unit 207a detects the audio level of the input audio data before the audio compression processing.

【００５８】検出された音声レベルのデータは、フォー
マティング部２０５に送られ、所定のフレームフォーマ
ットの補助データ部に付加されたのち、記録媒体１０１
に記録される。本実施例においては、所定のフレームフ
ォーマットの補助データ部に付加された音声レベルのデ
ータは、同じフレームの音声レベルをそのまま表すもの
ではなく、同じ音声レベルのデータが複数のフレームに
わたって配置されることもある。The data of the detected audio level is sent to the formatting unit 205 and added to the auxiliary data unit of a predetermined frame format.
Will be recorded. In the present embodiment, the audio level data added to the auxiliary data part of the predetermined frame format does not directly represent the audio level of the same frame, and the data of the same audio level is arranged over a plurality of frames. There is also.

【００５９】音声データの再生時においては、その音声
データの音声レベルを補助データ部に付加されたデータ
を用いて求めたプログラム曲の平均音声レベルのデータ
に基づいて電子ボリュームの利得を変えることは本発明
の第１の実施例と同じである。At the time of reproducing the audio data, it is not possible to change the gain of the electronic volume based on the data of the average audio level of the program music obtained by using the data added to the auxiliary data section. This is the same as the first embodiment of the present invention.

【００６０】上述した説明の音声記録再生装置は、記録
媒体としてＩＣメモリを使用したが、記録媒体としてＣ
Ｄ(Compact Disc)、ＭＤ(Mini Disc)、ＤＶＤ(Digital
Versatile Disc)、ＰＤ(Phase Change Disc)を使用した
光ディスク、光磁気ディスク又は相変化光ディスクであ
っても良く、又、磁気ハードディスクであっても良い。The above-described audio recording / reproducing apparatus uses an IC memory as a recording medium.
D (Compact Disc), MD (Mini Disc), DVD (Digital
An optical disk, a magneto-optical disk or a phase change optical disk using a Versatile Disc (PD) or a Phase Change Disc (PD) may be used, or a magnetic hard disk may be used.

【００６１】[0061]

【発明の効果】本発明により、サブバンド（ＳＢ）の音
声データをデコードしなくても、記録されている音声レ
ベルを容易に検出することができ、検出した音声レベル
のデータに基づいて再生する音声の出力レベルを制御し
て再生する複数のプログラム曲の音声レベルをほぼ同じ
とする音声記録再生装置及び記録媒体を得ることがであ
る。According to the present invention, the recorded audio level can be easily detected without decoding the audio data of the sub-band (SB), and reproduction is performed based on the detected audio level data. An object of the present invention is to provide an audio recording / reproducing apparatus and a recording medium in which the audio levels of a plurality of program tunes to be reproduced by controlling the audio output level are substantially the same.

[Brief description of the drawings]

【図１】本発明の一実施例の音声再生装置の構成を示す
ブロック図である。FIG. 1 is a block diagram illustrating a configuration of an audio playback device according to an embodiment of the present invention.

【図２】本発明の第１の実施例である音声記録再生装置
のエンコード／デコード部を説明する図である。FIG. 2 is a diagram illustrating an encoding / decoding unit of the audio recording / reproducing apparatus according to the first embodiment of the present invention.

【図３】本発明の第２の実施例である音声記録再生装置
のエンコード／デコード部を説明する図である。FIG. 3 is a diagram illustrating an encoding / decoding unit of an audio recording / reproducing apparatus according to a second embodiment of the present invention.

【図４】音声を圧縮伸張して記録再生する音声記録再生
装置において、入力音声データが異なるＳＢに分割され
てビットが割り当てられた音声データを説明する図であ
る。FIG. 4 is a diagram illustrating audio data in which input audio data is divided into different SBs and bits are assigned in an audio recording / reproducing apparatus that records and reproduces audio by compressing and expanding the audio.

【図５】本実施例の音声記録再生装置に使用するＭＰＥ
Ｇ１オーディオのフレームフォーマットにおける各種デ
ータの配置を説明する模式図である。FIG. 5 is an MPE used in the audio recording / reproducing apparatus of the embodiment.
FIG. 3 is a schematic diagram illustrating an arrangement of various data in a G1 audio frame format.

【図６】本実施例の音声記録再生装置における音声レベ
ルを検出する動作を説明するフローチャートである。FIG. 6 is a flowchart illustrating an operation of detecting an audio level in the audio recording / reproducing apparatus of the embodiment.

【図７】本実施例の音声記録再生装置において、再生時
に自動的に電子ボリュームの利得を制御する動作を説明
するフローチャートである。FIG. 7 is a flowchart illustrating an operation of automatically controlling the gain of the electronic volume at the time of reproduction in the audio recording / reproducing apparatus of the present embodiment.

【図８】本実施例の他の音声記録再生装置において、フ
レームフォーマット信号の補助データ部を使用しない場
合の電子ボリュームの利得を制御する動作を説明するフ
ローチャートである。FIG. 8 is a flowchart illustrating an operation of controlling the gain of the electronic volume in a case where the auxiliary data portion of the frame format signal is not used in another audio recording / reproducing apparatus of the embodiment.

[Explanation of symbols]

１０１記録媒体（ＩＣメモリ）、１０２イン
ターフェース部１０３システムマイコン、１０４操作
部１０５表示部、１０６Ｄ／
Ｆ１０７Ｄ／Ａ、１０８電子
ボリューム１０９バッファ回路、１１０エン
コード／デコード部１１１メモリ、１１２音声
再生回路部２０１帯域分割部、２０２マス
キング閾値算出部２０３ビットアロケーション／スケールファクター
演算部２０４再量子化部、２０５フォ
ーマッティング部２０６レベル比較判別部２０７スケールファクター最大値検出部２０７ａ音声レベル検出部Reference Signs List 101 recording medium (IC memory), 102 interface unit 103 system microcomputer, 104 operation unit 105 display unit, 106 D /
F 107 D / A, 108 electronic volume 109 buffer circuit, 110 encoding / decoding unit 111 memory, 112 audio reproduction circuit unit 201 band division unit, 202 masking threshold calculation unit 203 bit allocation / scale factor calculation unit 204 requantization unit, 205 Formatting section 206 Level comparison / determination section 207 Scale factor maximum value detection section 207a Audio level detection section

Claims

[Claims]

1. An audio recording / reproducing apparatus for compressing / expanding an audio signal to record / reproduce the audio signal, wherein the audio signal is divided into sub-bands of different audio frequency bands, quantized to obtain audio data, and a masking threshold is set. Masking threshold calculating means for calculating, level comparing means for comparing the level of the audio data with the masking threshold, bit allocation means for allocating bits to the sub-band based on the comparison result of the level comparing means, Requantization means for requantizing the audio data of the number of bits allocated to the subband by the allocation means, audio level detection means for detecting the audio level of the audio signal or audio data, and detection by the audio level detection means Data indicating the audio level is added to the auxiliary data section of the frame format. An audio recording / reproducing apparatus comprising: a signal conversion unit for converting a signal; and a control unit for controlling a gain of an audio level control unit based on data indicating the audio level when reproducing the signal of the frame format.

2. The audio recording / reproducing apparatus according to claim 1, wherein the bit allocation means calculates a scale factor of the sample data of the sub-band, and the audio level detecting section calculates a maximum value of the scale factor. Wherein the control means adds the maximum value of the scale factor to the auxiliary data section.

3. The audio recording / reproducing apparatus according to claim 1, wherein the audio level detecting unit detects an audio level of an audio signal or audio data input to the audio recording / reproducing apparatus, and the control unit includes: A voice recording / reproducing apparatus, wherein the voice level is added to the auxiliary data section.

4. A recording medium for recording a signal in a frame format in which data of a scale factor and audio data obtained by compressing audio data are arranged, wherein a maximum value of the scale factor or a value of the scale factor is stored in an auxiliary data portion of the frame format. A recording medium characterized by arranging and recording average values.