JPH08307277A

JPH08307277A - Method and device for variable rate voice coding

Info

Publication number: JPH08307277A
Application number: JP7113361A
Authority: JP
Inventors: Shinichi Obata; 信一小畑; Masafumi Nakamura; 雅文中村; Toshifumi Takeuchi; 敏文竹内
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1995-05-11
Filing date: 1995-05-11
Publication date: 1996-11-22
Anticipated expiration: 2017-07-15
Also published as: JP3301886B2

Abstract

PURPOSE: To generate coding data in which best sound quality is reproduced at reproduction by selecting a method of optimum data distribution through the fluctuation of local coding rate based on an information quantity on an audible sense psychology in the voice coder. CONSTITUTION: An acoustic information quantity calculation device 19 calculates acoustic information of a frame acoustic characteristic signal 18 and decides a bit allocation signal 20 in the case of quantization according to the acoustic information quantity. A coding rate re-distributer 16 corrects a distribution of a coding rate in N-frames so that a mean value of the coding rate for the N frames and a designated rate signal 14 are equal to each other and provides an output of a new coding rate 15 for each frame. Bit allocation processing is conducted in the acoustic information quantity calculation device 19 according to the new coding rate to decide the allocation signal 20. The re-quantizer applies re-quantization to a digital audio signal 8 according to the allocation signal 20 and provides an output of a re-quantization signal 21.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】高密度記録のディスクに音声を記
録する音声符号化装置に係り、特に決まった伝送レート
や記録媒体上の記録密度の範囲内において、最良の音質
を再現できる符号化データを生成する音声圧縮符号化の
方法及び装置に関する。BACKGROUND OF THE INVENTION Field of the Invention The present invention relates to an audio encoding device for recording audio on a high density recording disk, and in particular encoded data capable of reproducing the best sound quality within a range of a fixed transmission rate and recording density on a recording medium. The present invention relates to a method and an apparatus for audio compression coding for generating a.

【０００２】[0002]

【従来の技術】音質を一定にして高能率の符号化を行う
方法の一例としては、特開平４−１９２７２４に記載さ
れているものがある。2. Description of the Related Art An example of a method for performing high-efficiency coding with a constant sound quality is disclosed in Japanese Patent Laid-Open No. 4-192724.

【０００３】[0003]

【発明が解決しようとする課題】上記の従来方法は、そ
の処理方法の特殊性ゆえにスピーチ音声には有効だが、
一般の音響信号、特にＣＤ並を目標にしたディジタルオ
ーディオシステムではその効果をあまり発揮しない。ま
た、現在ある音声圧縮を利用したディジタルオーディオ
システムは伝送レートを固定にしているので、アタック
音部分では使用可能ビット数の不足により劣化が著しく
なる。更に、音声情報量が少ない部分では、必要以上の
データ容量が確保されてしまっている。従って、データ
容量の有効活用がされておらず、音質の改善される余地
が残っている。The above-mentioned conventional method is effective for speech voices due to the peculiarity of its processing method.
In a general audio signal, especially in a digital audio system aiming at a CD level, the effect is not exerted so much. Further, since the existing digital audio system using voice compression has a fixed transmission rate, the attack sound portion is significantly deteriorated due to the lack of the number of usable bits. Further, in a portion where the amount of voice information is small, an unnecessarily large data capacity is secured. Therefore, the data capacity is not effectively utilized, and there is room for improving the sound quality.

【０００４】[0004]

【課題を解決するための手段】本発明では、入力された
音声信号の聴覚心理上の情報量を算出し、その情報量の
大小により局所符号化レートを決定する。そしてその局
所符号化レートは、予め決めたフレーム単位で平均を取
った場合に一定となるように制御する。According to the present invention, the amount of information in the psychoacoustic information of an input voice signal is calculated, and the local coding rate is determined according to the magnitude of the amount of information. Then, the local coding rate is controlled so as to be constant when the average is taken in a predetermined frame unit.

【０００５】[0005]

【作用】入力されたディジタル音声信号の聴覚心理上の
情報量である音響情報量を音響情報量算出手段が見積も
る。この音響情報量が符号化単位ごとに増減するのに合
わせて符号化レート配分手段が各符号化単位に対する符
号化レートを増減させる。その結果として、音響情報量
が多い符号化単位には高い符号化レートが割り当てら
れ、音響情報量が少ない符号化単位には低い符号化レー
トが割り当てられる。The acoustic information amount calculating means estimates the acoustic information amount, which is the information amount in the psychoacoustic sense of the input digital audio signal. As the audio information amount increases or decreases for each coding unit, the coding rate distribution unit increases or decreases the coding rate for each coding unit. As a result, a high coding rate is assigned to a coding unit having a large amount of acoustic information, and a low coding rate is assigned to a coding unit having a small amount of acoustic information.

【０００６】[0006]

【実施例】図１は本発明の一実施例であり、７つの処理
ステップによって構成されている符号化処理フローチャ
ートである。順に処理を説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT FIG. 1 is an embodiment of the present invention and is a flow chart of an encoding process composed of seven processing steps. The processing will be described in order.

【０００７】処理ステップ１で符号化処理の基本単位で
あるフレーム毎にディジタル音声信号の入力処理が行わ
れる。次に処理ステップ２で、そのディジタル音声信号
に対して、一回目の符号化処理を行うことにより、上記
フレーム内のディジタル音声信号に含まれる聴覚心理上
の情報量である音響情報量を算出する。In processing step 1, a digital audio signal input process is performed for each frame which is a basic unit of the encoding process. Next, in a processing step 2, the digital audio signal is subjected to the first encoding process to calculate the acoustic information amount which is the psychoacoustic information amount contained in the digital audio signal in the frame. .

【０００８】ここで０から３１番までの３２個の周波数
帯域に分割して音響情報量を算出する例を図２に示す。
図２（ａ）は、信号パワーレベルＳ（ｉ）と、帯域内マ
スクや帯域間散布等で聞こえなくなる音の閾値レベルで
あるマスクレベルＭ（ｉ）のレベルを各帯域ごとに示し
たもので、Ｓ（ｉ）とＭ（ｉ）との差ＳＭＲ（ｉ）の大
きさが網かけの領域で示してある。図２（ｂ）がＳＭＲ
（ｉ）を各帯域毎に抜き出したもので、これを帯域ｓｂ
ｍａｘまで（ｓｂｍａｘは３１でなくとも良い）に渡っ
て加算したものを音響情報量として採用している。FIG. 2 shows an example in which the acoustic information amount is calculated by dividing into 32 frequency bands from 0 to 31.
FIG. 2A shows, for each band, the signal power level S (i) and the mask level M (i) which is a threshold level of a sound that cannot be heard due to an in-band mask or inter-band scattering. , S (i) and M (i), the magnitude of the difference SMR (i) is shown in the shaded area. Figure 2 (b) shows SMR
(I) is extracted for each band.
What is added up to max (sbmax does not have to be 31) is adopted as the acoustic information amount.

【０００９】処理ステップ３では上記一回目の符号化処
理による符号化データと上記音響情報量を示すインデッ
クス番号を保存する。処理ステップ４では、Ｎ（Ｎは２
以上の自然数）フレーム分のデータが溜ったかどうかを
判断し、溜っていない場合はステップ１に進み、次のフ
レームの入力処理を行う。溜った場合はステップ５に進
む。処理ステップ５では保存されていたＮフレーム分の
データを呼び戻す。ステップ６では呼び戻された上記音
響情報量を示すインデックス番号によりＮフレーム内で
の音響情報量の増減の変化を判断する。そして、個々の
フレームの音響情報量に合わせて符号化レートも変化
し、且つＮフレーム間の平均符号化レートが予め決まっ
た指定レートになるように、個々のフレームに対する符
号化レートを決定する。処理ステップ７で上記決定され
た符号化レートに従い、本エンコード処理である２回目
の符号化処理を行い、データストリームを構成する。In processing step 3, the coded data obtained by the first coding process and the index number indicating the acoustic information amount are stored. In processing step 4, N (N is 2
It is determined whether or not data for the above (natural number) frames has accumulated. If not, the process proceeds to step 1 to perform input processing for the next frame. If accumulated, go to step 5. In processing step 5, the stored N frames of data are recalled. In step 6, it is determined whether the acoustic information amount has increased or decreased within N frames by the index number indicating the recalled acoustic information amount. Then, the coding rate for each frame is determined so that the coding rate also changes according to the acoustic information amount of each frame and the average coding rate between N frames becomes a predetermined designated rate. In the processing step 7, the second encoding process, which is the main encoding process, is performed in accordance with the encoding rate determined above to form a data stream.

【００１０】図３に時間軸に相当するフレーム番号と個
々のフレームに対する符号化レートが音響情報量の増減
に伴って調整されている例を具体的に示す。図３（ａ）
は図２のような手順に従って求められた、個々のフレー
ムに対する音響情報量の変化の様子を示している。これ
に対して符号化レートも、図３（ｂ）のように同様の上
がり下がりの特性を持たせている。これにより、音質は
一定となるが、通常、記録媒体の都合等の制約により、
一定の記録レートが要求される。そこで、この例では１
０フレームを１音声グループとして（Ｎ＝１０）、１音
声グループ内での平均符号化レートが指定レート値にな
るように個々のフレームに対する符号化レートが調整さ
れている。FIG. 3 specifically shows an example in which the frame number corresponding to the time axis and the coding rate for each frame are adjusted as the amount of audio information increases or decreases. FIG. 3 (a)
Shows the change of the acoustic information amount for each frame, which is obtained according to the procedure shown in FIG. On the other hand, the coding rate also has the same rising and falling characteristics as shown in FIG. As a result, the sound quality will be constant, but usually due to the restrictions of the recording medium,
A constant recording rate is required. So, in this example, 1
The coding rate for each frame is adjusted so that 0 frame is defined as one audio group (N = 10), and the average coding rate within one audio group becomes the specified rate value.

【００１１】このようにＮフレームを単位として、各フ
レームごとの音響情報量の変化に合わせて符号化レート
を決めることで、音質の劣化を平均化し、且つ使用可能
なデータ容量の範囲内で最良の音質を保つことができる
符号化データの生成が可能となる。By thus determining the coding rate in units of N frames in accordance with the change in the amount of audio information for each frame, the deterioration of the sound quality is averaged and the best data is obtained within the usable data capacity. It is possible to generate encoded data that can maintain the sound quality.

【００１２】図４は別の算出法により音響情報量を求め
た例であり、各帯域の信号パワーＳ（ｉ）を帯域ｓｂｍ
ａｘ（ｓｂｍａｘは３１でなくとも良い）までに渡って
加算したものを音響情報量として採用している。この例
では最小可聴限や帯域間マスクを考慮せずに、単純に帯
域毎の音圧レベルの大きさで見積もることになるので高
域の信号や小さめの信号も間引かれずに音響情報量に反
映する。FIG. 4 is an example in which the amount of acoustic information is obtained by another calculation method, and the signal power S (i) of each band is set to the band sbm.
What is added up to ax (sbmax does not have to be 31) is adopted as the acoustic information amount. In this example, the sound pressure level for each band is simply estimated without considering the minimum audible limit and the band-to-band mask. reflect.

【００１３】図５は図２のようにして得られた帯域別の
ＳＭＲ（ｉ）に対して帯域ごとに異なる重み付けを施し
た例である。ここでは、Ｋ（ｉ）がＫ（ｉ＋１）以上に
なるような実数Ｋ（ｉ）を用いて、このＫ（ｉ）とＳＭ
Ｒ（ｉ）との積を帯域毎に求め、それを帯域ｓｂｍａｘ
（ｓｂｍａｘは３１でなくとも良い）までに渡って加算
したものを音響情報量として採用している。FIG. 5 shows an example in which the SMR (i) for each band obtained as shown in FIG. 2 is weighted differently for each band. Here, the real number K (i) such that K (i) is equal to or greater than K (i + 1) is used, and this K (i) and SM are used.
The product of R (i) is obtained for each band, and the product is calculated as band sbmax
The value obtained by adding up to (sbmax does not have to be 31) is adopted as the acoustic information amount.

【００１４】０から３１までの帯域が最高周波数までを
等分割したものであるなら、低域ほど１つの帯域が大き
いバーク幅を持ち、聴覚心理上大きな影響を与える。そ
のような場合に対してこの図５のような例の重み付けを
行えば、より的確な音響情報量の見積りができる。If the band from 0 to 31 is obtained by equally dividing the highest frequency, one band has a larger bark width in the lower frequency range, which has a great influence on the psychology of hearing. In such a case, if the weighting of the example shown in FIG. 5 is performed, the sound information amount can be more accurately estimated.

【００１５】図６は、図５と同様の重み付けを図４の例
に対して適用したものであり、Ｋ（ｉ）とＳ（ｉ）との
積を帯域毎に求め、それを帯域ｓｂｍａｘ（ｓｂｍａｘ
は３１でなくとも良い）までに渡って加算したものを音
響情報量として採用している。これによってバ−ク軸上
への変換を行わずに、簡易的に聴覚心理上の軸での見積
りを行うことができる。FIG. 6 is a diagram in which the same weighting as in FIG. 5 is applied to the example of FIG. 4, and the product of K (i) and S (i) is obtained for each band, and this is calculated as the band sbmax ( sbmax
Is not required to be 31) and the sum is added as the acoustic information amount. As a result, the estimation on the psychoacoustic axis can be easily performed without performing the conversion on the bark axis.

【００１６】図７は本発明による符号化装置の一実施例
を示したものである。ディジタル音声信号８を入力し、
符号化ビットストリーム９と音響情報量を示すインデッ
クス１０を出力する音声符号化器１１、記録ヘッド１
２、光ディスク、磁気ディスク等の記録媒体１３、音響
情報量インデックス１０と指定レート信号１４を入力
し、修正符号化レートインデックス１５を出力する符号
化レート再配分器１６、更に音声符号化器１１の中身と
してディジタル音声信号８を再量子化する再量子化器１
７とディジタル音声信号８の聴覚心理上の情報量を算出
し、使用するべきデータ容量を見積もる音響情報量算出
器１９と、再量子化された信号２１を入力し符号化ビッ
トストリーム９を出力するフレーム構成器２２とで構成
されている。順に動作を説明する。FIG. 7 shows an embodiment of the encoding apparatus according to the present invention. Input the digital voice signal 8,
A coded bit stream 9 and a voice encoder 11 that outputs an index 10 indicating the amount of audio information, and a recording head 1.
2, a recording medium 13 such as an optical disk or a magnetic disk, a coding rate redistributer 16 which inputs the audio information amount index 10 and a designated rate signal 14 and outputs a modified coding rate index 15, and further a speech coder 11 Requantizer 1 for requantizing digital audio signal 8 as contents
7 and an audio information amount calculator 19 for calculating the psychoacoustic information amount of the digital audio signal 8, and an acoustic information amount calculator 19 for estimating a data capacity to be used, and a requantized signal 21 are input and an encoded bit stream 9 is output. It is composed of a frame configurator 22. The operation will be described in order.

【００１７】まず第１回目の符号化処理によって音響情
報量が見積もられる。符号化の基本単位であるフレーム
毎にディジタル音声信号８が音声符号化器１１に入力さ
れる。音響情報量算出器１９は上記１フレームのディジ
タル音声信号８か或いはディジタル音声信号８を時間周
波数変換したスペクトルかであるフレーム音響特性信号
１８を入力し、聴覚心理上の情報量である音響情報量を
算出し、上記音響情報量に従って量子化に際してのビッ
ト割り当てを、量子化雑音レベルが検知不能レベル以下
となる様に決定する。そしてビット割り当て信号２０と
音響情報量インデックス１０を出力する。First, the amount of acoustic information is estimated by the first encoding process. The digital audio signal 8 is input to the audio encoder 11 for each frame which is a basic unit of encoding. The acoustic information amount calculator 19 inputs the frame acoustic characteristic signal 18 which is the digital audio signal 8 of one frame or the spectrum obtained by time-frequency converting the digital audio signal 8 and outputs the acoustic information amount which is the information amount in psychoacoustic sense. Is calculated, and bit allocation for quantization is determined according to the amount of acoustic information so that the quantization noise level becomes equal to or lower than the undetectable level. Then, the bit allocation signal 20 and the audio information amount index 10 are output.

【００１８】そして次に同じ音声データの２回目の符号
化処理（本エンコード）が行われる。符号化の基本単位
であるフレーム毎にディジタル音声信号８が音声符号化
器１１に入力される。音響情報量算出器１９では上記フ
レーム音響特性信号１８の音響情報量を算出し、上記音
響情報量に従って量子化に際してのビット割り当て２０
を決定する。ここで、符号化レート再配分器は１回目の
符号化処理で決まった符号化レートのＮフレーム間の平
均値と指定レート信号１４とが等しくなるようにＮフレ
ーム内で符号化レートの配分を修正し、フレーム毎の新
たな符号化レート１５を出力する。この新たな符号化レ
ートに従って音響情報量算出器１９内でのビット割り当
て処理を行い、割り当て信号２０が決定される。そして
再量子化器では上記割り当て信号２０に従い、ディジタ
ル音声信号８を再量子化し、再量子化信号２１を出力す
る。フレーム構成器２２は再量子化された信号２１から
符号化ビットストリーム９を構成して出力する。符号化
ビットストリーム９は記録ヘッド１２によって記録媒体
１３に記録される。Then, a second encoding process (main encoding) of the same audio data is performed. The digital audio signal 8 is input to the audio encoder 11 for each frame which is a basic unit of encoding. The acoustic information amount calculator 19 calculates the acoustic information amount of the frame acoustic characteristic signal 18, and bit allocation 20 for quantization according to the acoustic information amount is performed.
To decide. Here, the coding rate redistributer distributes the coding rates within the N frames so that the average value of the coding rates determined in the first coding process among the N frames becomes equal to the designated rate signal 14. It is corrected and a new coding rate 15 for each frame is output. Bit allocation processing is performed in the acoustic information amount calculator 19 according to this new coding rate, and the allocation signal 20 is determined. Then, the requantizer requantizes the digital voice signal 8 according to the allocation signal 20 and outputs the requantized signal 21. The frame composer 22 composes the coded bitstream 9 from the requantized signal 21 and outputs it. The encoded bitstream 9 is recorded on the recording medium 13 by the recording head 12.

【００１９】このようにして、１回目の符号化処理でＮ
フレーム内の音響情報量の変化の様子を把握し、２回目
の符号化処理で音響情報量に合わせた符号化レートで符
号化処理が行われる。この結果、音質の劣化の度合いが
平均化されると同時に、制限データ容量内においての最
良の音質が得られるビットストリーム９が生成できる。
また、記録媒体１３はディスクに限る必要はなく、テ
ープ、その他でも良い。また更に記録媒体に記録するの
ではなく、伝送路に送出するのでも良い。In this way, N is set in the first encoding process.
By grasping how the acoustic information amount in the frame changes, the encoding process is performed in the second encoding process at the encoding rate adapted to the acoustic information amount. As a result, the degree of deterioration of the sound quality is averaged, and at the same time, the bit stream 9 that can obtain the best sound quality within the limited data capacity can be generated.
Further, the recording medium 13 is not limited to the disk, and may be a tape or the like. Further, instead of recording on the recording medium, the data may be sent to the transmission path.

【００２０】図８は図７での実施例に対し、新たに、入
力信号８を溜めておくＮフレーム音声メモリ回路２３を
設けている。In comparison with the embodiment shown in FIG. 7, FIG. 8 is additionally provided with an N-frame voice memory circuit 23 for storing the input signal 8.

【００２１】ディジタル音声信号８はメモリ回路２３に
入力し、音響情報量の見積もりのための音声信号２５と
本エンコード用の音声信号２４とが出力される。音響情
報量算出器１９は上記音声信号２５を入力し、聴覚心理
上の情報量である音響情報量を算出し、音響情報量イン
デックス１０を出力する。符号化レート再配分器は音響
情報量インデックス１０をＮフレーム間溜め込み、符号
化レートのＮフレーム間の平均値と指定レート信号１４
とが等しくなるように且つ、音響情報量インデックス１
０の増減に合わせて増減する符号化レート１５を決定す
る。The digital audio signal 8 is input to the memory circuit 23, and an audio signal 25 for estimating the acoustic information amount and an audio signal 24 for main encoding are output. The acoustic information amount calculator 19 inputs the audio signal 25, calculates the acoustic information amount which is the information amount in psychoacoustic, and outputs the acoustic information amount index 10. The coding rate redistributer stores the audio information amount index 10 for N frames, and calculates the average value of the coding rates for N frames and the designated rate signal 14.
So that and become equal, and the acoustic information amount index 1
The coding rate 15 that increases or decreases according to the increase or decrease of 0 is determined.

【００２２】Ｎフレームのそれぞれの符号化レートが決
定した後に、本エンコード処理として、上記音声信号２
４が再量子化器１７に入力し、上記フレーム音響特性信
号１８が音響情報量算出器１９に送られる。音響情報量
算出器１９では上記フレーム音響特性信号１８の音響情
報量を算出し、上記音響情報量に従って量子化に際して
のビット割り当て２０を決定する。この時、符号化レー
ト１５が音声信号２４とタイミングが合わされて入力
し、符号化レート１５の条件の下に割り当て２０が決ま
る。そして再量子化器では上記割り当て信号２０に従
い、音声信号２４を再量子化し、再量子化信号２１を出
力する。フレーム構成器２２は再量子化された信号２１
から符号化ビットストリーム９を構成して出力する。符
号化ビットストリーム９は記録ヘッド１２によって記録
媒体ディスク１３に記録される。After the coding rate of each of the N frames is determined, the audio signal 2 is subjected to the main encoding process.
4 is input to the requantizer 17, and the frame acoustic characteristic signal 18 is sent to the acoustic information amount calculator 19. The acoustic information amount calculator 19 calculates the acoustic information amount of the frame acoustic characteristic signal 18, and determines the bit allocation 20 for quantization in accordance with the acoustic information amount. At this time, the coding rate 15 is input in time with the audio signal 24, and the allocation 20 is determined under the condition of the coding rate 15. Then, the requantizer requantizes the audio signal 24 according to the allocation signal 20 and outputs the requantized signal 21. The frame composer 22 uses the requantized signal 21
To form and output the encoded bit stream 9. The encoded bitstream 9 is recorded on the recording medium disk 13 by the recording head 12.

【００２３】このようにして、Ｎフレーム内の音響情報
量の変化の様子を把握し、本エンコード処理で音響情報
量に合わせて変動する符号化レートで符号化処理が行わ
れる。この結果、音質の劣化の度合いが平均化されると
同時に、制限データ容量内においての最良の音質が得ら
れるビットストリーム９が生成できる。In this way, the state of changes in the amount of acoustic information in N frames is grasped, and the encoding process is performed in the main encoding process at the encoding rate that changes according to the amount of acoustic information. As a result, the degree of deterioration of the sound quality is averaged, and at the same time, the bit stream 9 that can obtain the best sound quality within the limited data capacity can be generated.

【００２４】これによって、音響情報量の見積りと本エ
ンコードを一定時間内で交互に行えるため、短時間でよ
り小さい記録処理遅延時間で図４と同様の符号化処理を
行うことができる。As a result, since the estimation of the acoustic information amount and the main encoding can be alternately performed within a fixed time, it is possible to perform the encoding processing similar to that in FIG. 4 in a short time and with a smaller recording processing delay time.

【００２５】図９は図７での符号化レート再配分器１６
の代わりに、マージン量ＭＡＲＧ２８と音響情報量２６
を入力し、本来必要とされる音響情報量２６に対してマ
ージン量ＭＡＲＧ２８だけ（ＭＡＲＧは実数）余計にデ
ータ容量が確保できるようにフレームに対する符号化レ
ート２９を決定する符号化レート決定器２７と、現在ま
での符号化レート２９の平均値ＡＶＥ３１を出力する平
均値算出器３０と、平均値ＡＶＥ３１と予め決めた上限
平均値ＨＩＧＨ３２と下限平均値ＬＯＷ３３を入力し、
ＡＶＥ３１がＨＩＧＨ３２以上になったらＭＡＲＧ２８
の値を減らし、ＡＶＥ３１がＬＯＷ３３以下になったら
ＭＡＲＧ２８の値を増やす様にＭＡＲＧ２８の値を制御
するマージン調整回路３４を備えた例である。FIG. 9 shows the coding rate redistributer 16 of FIG.
Instead of the margin amount MARG 28 and the acoustic information amount 26
And a coding rate determiner 27 for deciding a coding rate 29 for the frame so that a margin amount MARG 28 (MARG is a real number) can be additionally reserved for the originally required acoustic information amount 26. , An average value calculator 30 that outputs an average value AVE31 of the encoding rates 29 up to the present time, an average value AVE31, a predetermined upper limit average value HIGH32, and a lower limit average value LOW33 are input,
When AVE31 becomes HIGH32 or higher, MARG28
In this example, the margin adjustment circuit 34 is provided to control the value of the MARG 28 so that the value of the MARG 28 is increased when the AVE 31 becomes LOW 33 or less.

【００２６】符号化の基本単位であるフレーム毎にディ
ジタル音声信号８が音声符号化器１１に入力される。音
響情報量算出器１９は上記１フレームのディジタル音声
信号８か或いはディジタル音声信号８を時間周波数変換
したスペクトルかであるフレーム音響特性信号１８を入
力し、聴覚心理上の情報量である音響情報量２６を算出
する。符号化レート決定器２７は音響情報量２６に対し
て上記マージン量２８だけ余裕が取れるように符号化レ
ート２９を決める。符号化レート２９に従って音響情報
量算出器１９は量子化に際してのビット割り当て２０を
決定する。そしてビット割り当て信号２０と音響情報量
２６を出力する。そして再量子化器では上記割り当て信
号２０に従い、ディジタル音声信号８を再量子化し、再
量子化信号２１を出力する。フレーム構成器２２は再量
子化された信号２１から符号化ビットストリーム９を構
成して出力する。符号化ビットストリーム９は記録ヘッ
ド１２によって記録媒体ディスク１３に記録される。The digital audio signal 8 is input to the audio encoder 11 for each frame which is a basic unit of encoding. The acoustic information amount calculator 19 inputs the frame acoustic characteristic signal 18 which is the digital audio signal 8 of one frame or the spectrum obtained by time-frequency converting the digital audio signal 8 and outputs the acoustic information amount which is the information amount in psychoacoustic sense. 26 is calculated. The coding rate deciding unit 27 decides the coding rate 29 so that the margin amount 28 can be given to the acoustic information amount 26. According to the coding rate 29, the acoustic information amount calculator 19 determines the bit allocation 20 for quantization. Then, the bit allocation signal 20 and the audio information amount 26 are output. Then, the requantizer requantizes the digital voice signal 8 according to the allocation signal 20 and outputs the requantized signal 21. The frame composer 22 composes the coded bitstream 9 from the requantized signal 21 and outputs it. The encoded bitstream 9 is recorded on the recording medium disk 13 by the recording head 12.

【００２７】これと並行して、平均値算出器３０は符号
化レート２９の現時点までの平均値ＡＶＥ３１を算出す
る。マージン調整器３４ではＡＶＥ３１がＨＩＧＨ３２
以上になったらＭＡＲＧ２８の値を減らし、ＡＶＥ３１
がＬＯＷ３３以下になったらＭＡＲＧ２８の値を増や
し、ＭＡＲＧ２８を符号化レート決定器２７に送る。In parallel with this, the average value calculator 30 calculates the average value AVE31 of the coding rate 29 up to the present time. In the margin adjuster 34, AVE31 is HIGH32
If it is above, reduce the value of MARG28,
Becomes LOW 33 or less, the value of the MARG 28 is increased and the MARG 28 is sent to the coding rate determiner 27.

【００２８】このようにして、ほぼリアルタイムで可変
レート符号化処理ができる。また上記音声グループの切
れ目と音響情報量の変化とのタイミングが悪いと、情報
量が上限以上の音声グループと下限以下の音声グループ
が隣接してできてしまう場合があるが、この例では平均
値を通算で算出しているため、そのような不都合を回避
することができる。また、ＭＡＲＧ２８は正の数である
必要はなく負の数でも良い。ＭＡＲＧ２８が負の場合は
ＭＡＲＧ２８の絶対値が劣化の度合いを示すが、ＭＡＲ
Ｇ２８が大きいほど、より高い符号化レートが選ばれる
点は変わらない。In this way, the variable rate coding process can be performed almost in real time. Also, if the timing of the break of the voice group and the change of the acoustic information amount is bad, a voice group having an information amount of the upper limit or more and a voice group of the lower limit or less may be formed adjacently, but in this example, the average value Since it is calculated by adding up, it is possible to avoid such an inconvenience. Further, the MARG 28 does not have to be a positive number and may be a negative number. When MARG28 is negative, the absolute value of MARG28 indicates the degree of deterioration.
The larger G28 is, the higher coding rate is still selected.

【００２９】図１０は本発明の他の一例であり、ステッ
プ３５、ステップ３６、ステップ３７の３ステップで構
成され、ステップ３５は更にステップ３８、ステップ３
９、ステップ４０の３ステップで構成され、ステップ３
７は更にステップ４１、ステップ４２の２ステップで構
成されている符号化処理フローチャートである。順に処
理を説明する。FIG. 10 shows another example of the present invention, which comprises three steps of step 35, step 36, and step 37, and step 35 further includes step 38 and step 3.
It consists of 3 steps of 9 and step 40, and step 3
7 is a flowchart of the encoding process which is further composed of two steps 41 and 42. The processing will be described in order.

【００３０】まず処理ステップ３５で１回目の符号化処
理が行われる。処理ステップ３８で符号化処理の基本単
位であるフレーム毎にディジタル音声信号の１回目の入
力処理が行われる。次に処理ステップ３９で、そのディ
ジタル音声信号に対して、１回目の符号化処理を行うこ
とにより、上記フレーム内のディジタル音声信号に含ま
れる聴覚心理上の情報量である音響情報量を算出する。First, in process step 35, the first encoding process is performed. In process step 38, the first input process of the digital audio signal is performed for each frame which is a basic unit of the encoding process. Next, in processing step 39, the first encoding process is performed on the digital audio signal to calculate the acoustic information amount which is the psychoacoustic information amount included in the digital audio signal in the frame. .

【００３１】処理ステップ４０では上記１回目の符号化
処理による上記音響情報量を示すインデックスを保存す
る。このステップ３８、ステップ３９、処理ステップ４
０がフレーム毎に繰り返される。In process step 40, the index indicating the amount of acoustic information by the first encoding process is stored. This step 38, step 39, processing step 4
0 is repeated for each frame.

【００３２】音声信号の１回目の符号化処理が終わった
あと、処理ステップ３６で保存されていた音響情報量イ
ンデックスにより、フレーム毎の音響情報量の増減に合
わせて各フレームに対する符号化レートを決定する。After the first encoding process of the audio signal is completed, the encoding rate for each frame is determined according to the increase or decrease of the acoustic information amount for each frame by the acoustic information amount index stored in the processing step 36. To do.

【００３３】続いて処理ステップ３７で２回目の符号化
処理が行われる。処理ステップ４１でフレーム毎にディ
ジタル音声信号の２回目の入力処理が行われる。処理ス
テップ４２で上記決定された各フレームに対する符号化
レートに従い、本エンコード処理である２回目の符号化
処理を行い、データストリームを構成する。このステッ
プ４１、ステップ４２をフレーム毎に繰り返す。Subsequently, in the processing step 37, the second encoding processing is performed. In the processing step 41, the second input processing of the digital audio signal is performed for each frame. In the processing step 42, the second encoding process, which is the main encoding process, is performed in accordance with the encoding rate for each frame determined above to form a data stream. The steps 41 and 42 are repeated for each frame.

【００３４】このように各フレームごとの音響情報量の
変化に合わせて符号化レートを決めることで、音質の劣
化を平均化し、且つ最良の音質を保つことができる符号
化データの生成が可能となる。By thus determining the coding rate in accordance with the change in the amount of acoustic information for each frame, it is possible to average the deterioration of sound quality and generate coded data that can maintain the best sound quality. Become.

【００３５】[0035]

【発明の効果】聴覚心理上の情報量に従って符号化レー
トを決定できるため、ビットがあまり必要のない部分で
の余裕がビットを多く必要とする部分に回される形とな
る。これにより、符号化による歪は平均化され且つ最小
限に食い止めることができ、最良の音質を保つことがで
きる。Since the coding rate can be determined according to the psychoacoustic information amount, the margin in the portion where the bit is not required is put in the portion where the bit is required. As a result, the distortion due to encoding can be averaged and minimized, and the best sound quality can be maintained.

[Brief description of drawings]

【図１】本発明を適用したディジタル音声符号化方法の
処理フローの一実施例である。FIG. 1 is an example of a processing flow of a digital voice encoding method to which the present invention is applied.

【図２】本発明での音響情報量として、ＳＭＲの見積り
行うを一例である。FIG. 2 is an example of estimating SMR as the amount of acoustic information in the present invention.

【図３】本発明を適用した時の、個々のフレームに対す
る符号化レートの変化の一例である。FIG. 3 is an example of a change in coding rate for each frame when the present invention is applied.

【図４】本発明での音響情報量として、信号パワーの見
積りを行う一例である。FIG. 4 is an example of estimating signal power as the amount of acoustic information in the present invention.

【図５】本発明での音響情報量として、帯域毎にＳＭＲ
に重み付けを加えた物の見積り行うを一例である。FIG. 5 shows SMR for each band as the amount of acoustic information in the present invention.
This is an example of estimating the weighted product.

【図６】本発明での音響情報量として、帯域毎に信号パ
ワーに重み付けを加えた物の見積り行うを一例である。FIG. 6 shows an example of estimating the amount of acoustic information in the present invention by weighting signal power for each band.

【図７】本発明を適用したディジタル音声符号化装置の
一実施例である。FIG. 7 shows an embodiment of a digital speech coding apparatus to which the present invention is applied.

【図８】本発明を適用したディジタル音声符号化装置の
一実施例で、入力データメモリ回路を付加した一例であ
る。FIG. 8 is an example of a digital speech encoding apparatus to which the present invention is applied, in which an input data memory circuit is added.

【図９】本発明を適用したディジタル音声符号化装置の
一実施例で、リアルタイム処理を可能とした一例であ
る。FIG. 9 is an example of a digital voice encoding device to which the present invention is applied, which is an example of enabling real-time processing.

【図１０】本発明を適用したディジタル音声符号化方法
の処理フローの他の実施例である。FIG. 10 is another embodiment of the processing flow of the digital voice encoding method to which the present invention is applied.

[Explanation of symbols]

８ディジタル音声信号９符号化ビットストリーム１０音響情報量インデックス１１音声符号化器１５修正符号化レートインデックス１６符号化レート再配分器１９音響情報量算出器 8 Digital Speech Signal 9 Encoded Bit Stream 10 Acoustic Information Amount Index 11 Speech Encoder 15 Modified Encoding Rate Index 16 Encoding Rate Redistributer 19 Acoustic Information Amount Calculator

Claims

[Claims]

1. A digital audio signal is input by a predetermined coding unit, the acoustic information amount is calculated by estimating the psychoacoustic information amount of the digital audio signal, and the digital audio signal is calculated based on the acoustic information amount. An audio compression code for outputting an allocation signal indicating allocation of data capacity when compressing, regenerating a requantized signal from the digital audio signal according to the allocation signal, and forming an encoded bitstream from the requantized signal. In the encoding method, the encoding rate for the encoding unit is increased when the acoustic information amount for each encoding unit increases, and the encoding rate for the encoding unit is decreased when the acoustic information amount decreases. A speech coding method characterized by:

2. A digital audio signal is input by a predetermined coding unit, the acoustic information amount is calculated by estimating the psychoacoustic information amount of the digital audio signal, and the digital audio signal is calculated based on the acoustic information amount. An audio compression code for outputting an allocation signal indicating allocation of data capacity when compressing, regenerating a requantized signal from the digital audio signal according to the allocation signal, and forming an encoded bitstream from the requantized signal. In the encoding method, the coding rate for the coding unit is increased when the amount of acoustic information for each coding unit is increased, and the coding rate for the coding unit is decreased when the amount of acoustic information is decreased, and The average value of the coding rates is determined in advance in a speech group formed of P coding units (P is a natural number of 2 or more). Speech encoding method characterized by controlling the coding rate so that the value of the specified coding rate was.

3. The speech encoding method according to claim 1, wherein the amount of acoustic information is calculated by dividing an input digital speech into N bands (N is a natural number of 3 or more) on a frequency axis, and a band i is calculated. Signal power S within (i is from lowest range 0 to highest range N-1)
(I) and mask power M as a result of psychoacoustic analysis
The difference SMR (i) from (i) is added over a certain band j (j is 0 or more and N-2 or less) to a certain band k (k is 1 or more and N-1 or less). A voice encoding method, wherein the audio information amount is obtained as each of the encoding units.

4. The voice coding method according to claim 1, wherein the amount of acoustic information is calculated by dividing the input digital voice into N bands (N is a natural number of 3 or more) on the frequency axis, and a band i is calculated. Signal power S within (i is from lowest range 0 to highest range N-1)
(I) and mask power M as a result of psychoacoustic analysis
The difference between (i) and SMR (i) is obtained, and K (i) is K
A real number K (i) that is equal to or greater than (i + 1) is added to the SMR
K (i) × SMR (i), which is the product of (i), from a certain band j (j is 0 or more and N-2 or less), a certain band k (k is 1 or more and N-1 or less) A speech coding method, characterized in that a sum obtained up to the above is obtained as the acoustic information amount for each coding unit.

5. The speech coding method according to claim 1, wherein the amount of acoustic information is calculated by dividing an input digital speech into N bands (N is a natural number of 3 or more) on a frequency axis, and a band i is calculated. Signal power S within (i is from lowest range 0 to highest range N-1)
(I) a certain band j (j is 0 or more and N-2 or less)
To a certain band k (k is not less than 1 and not more than N-1) and obtained as the acoustic information amount for each of the encoding units.

6. The voice encoding method according to claim 1, wherein the amount of acoustic information is calculated by dividing an input digital voice into N bands (N is a natural number of 3 or more) on a frequency axis, and a band i is calculated. Signal power S within (i is from lowest range 0 to highest range N-1)
(I) is multiplied by a real number K (i) such that K (i) is K (i + 1) or more. K (i) × S (i) is a certain band j (j is 0 or more). A speech coding method, characterized in that a sum obtained from (N-2 or less) to a certain band k (k is 1 or more and N-1 or less) is obtained as the acoustic information amount for each coding unit. .

7. A digital audio signal is input by a predetermined coding unit, the acoustic information amount is calculated by estimating the psychoacoustic information amount of the digital audio signal, and the digital audio signal is calculated based on the acoustic information amount. An audio compression code for outputting an allocation signal indicating allocation of data capacity when compressing, regenerating a requantized signal from the digital audio signal according to the allocation signal, and forming an encoded bitstream from the requantized signal. In the digitization method, a certain amount of margin MARG with respect to the acoustic information amount.
(MARG is a real number) The coding rate for the coding unit is determined so as to secure an extra data capacity, the average value AVE of the coding rates up to the present time for the coding unit is calculated, and the average value is calculated. When AVE becomes lower than a predetermined low rate limit value LOW, the above-mentioned constant margin amount MARG is increased to obtain the above-mentioned average value AVE.
Is higher than a predetermined high rate limit value HIGH, the above-mentioned constant margin amount MARG is reduced.

8. An audio information amount calculating means for inputting a digital audio signal by a predetermined coding unit, estimating the psychoacoustic information amount of the digital audio signal, and outputting it as an audio information amount, and the audio information amount. Allocation means for outputting an allocation signal indicating allocation of data capacity when compressing the digital audio signal based on the above; requantization means for generating a requantized signal from the digital audio signal according to the allocation signal; In a voice compression encoding device including a stream forming means for forming an encoded bit stream from a quantized signal, a coding rate for the encoding unit is increased when the acoustic information amount for each encoding unit increases. , If it decreases, adjust the coding rate allocation so that the coding rate for the above coding unit is lowered. Speech coding apparatus comprising: a.

9. An audio information amount calculating means for inputting a digital audio signal by a predetermined coding unit, estimating the psychoacoustic information amount of the digital audio signal, and outputting the audio information amount as the audio information amount. Allocation means for outputting an allocation signal indicating allocation of data capacity when compressing the digital audio signal based on the above; requantization means for generating a requantized signal from the digital audio signal according to the allocation signal; In a voice compression encoding device including a stream forming means for forming an encoded bit stream from a quantized signal, a coding rate for the encoding unit is increased when the acoustic information amount for each encoding unit increases. , The coding rate for the above coding unit is lowered, and a predetermined P number ( Is a natural number greater than or equal to 2), and a coding rate allocating unit that adjusts the coding rate so that the average of the coding rates becomes a predetermined designated coding rate within a speech group configured of coding units. A speech coding apparatus comprising:

10. An audio information amount calculating means for inputting a digital audio signal by a predetermined coding unit, estimating the psychoacoustic information amount of the digital audio signal, and outputting the audio information amount as the audio information amount. Allocation means for outputting an allocation signal indicating allocation of data capacity when compressing the digital audio signal based on the above; requantization means for generating a requantized signal from the digital audio signal according to the allocation signal; In a voice compression encoding device including a stream forming means for forming an encoded bit stream from a quantized signal, a coding rate for the encoding unit is increased when the acoustic information amount for each encoding unit increases. , If it decreases, the coding rate for the above coding unit is lowered, and a predetermined P P is a natural number of 2 or more) A coding rate for setting a coding rate adjusted so that the average of the coding rates becomes a predetermined designated coding rate in a speech group formed of coding units. A speech coding apparatus comprising: a distribution unit and a coding rate conversion unit that reconfigures the coded bitstream into a coded bitstream according to an adjusted coding rate.

11. An audio information amount calculation means for inputting a digital audio signal by a predetermined coding unit, estimating the psychoacoustic information amount of the digital audio signal, and outputting it as an audio information amount, and the audio information amount. Allocation means for outputting an allocation signal indicating allocation of data capacity when compressing the digital audio signal based on the above; requantization means for generating a requantized signal from the digital audio signal according to the allocation signal; In a voice compression encoding device provided with a stream forming means for forming an encoded bit stream from a quantized signal, a predetermined P (P is a natural number of 2 or more) encoding units of the digital audio signal are accumulated. Data holding means,
When the audio information amount for each coding unit increases, the coding rate for the coding unit is increased, and when it decreases, the coding rate for the coding unit is decreased, and a predetermined P is set. A code that sets a coding rate adjusted so that the average of the coding rates becomes a predetermined designated coding rate within a speech group that is composed of (P is a natural number of 2 or more) coding units. And requantization means for extracting the digital audio signal from the data holding means in time with the adjusted coding rate output from the coding rate distribution means. A speech coder that operates.

12. An audio information amount calculating means for inputting a digital audio signal by a predetermined coding unit, estimating the psychoacoustic information amount of the digital audio signal, and outputting the audio information amount as the audio information amount. Allocation means for outputting an allocation signal indicating allocation of data capacity when compressing the digital audio signal based on the above; requantization means for generating a requantized signal from the digital audio signal according to the allocation signal; In a voice compression coding apparatus provided with a stream composing means for composing an encoded bit stream from a quantized signal, a certain margin amount MARG with respect to the acoustic information amount.
(MARG is a real number) Coding rate determining means for deciding the coding rate for the coding unit so as to secure an extra data capacity, and an average value of the coding rates up to the present time for the coding unit ( A coding rate average value calculating means for obtaining AVE) and the average value (AV
When E) becomes lower than a predetermined low rate limit value (LOW), the above-mentioned constant margin amount (MARG) is increased, and the average value (AVE) is set to a predetermined high rate limit value (HIGH). A voice coding apparatus, comprising: a margin adjusting means for reducing the above-mentioned constant margin amount (MARG) when the value becomes higher.