JP2008309875A

JP2008309875A - Speech encoding device, speech decoding device, speech encoding method, speech decoding method, and program

Info

Publication number: JP2008309875A
Application number: JP2007155308A
Authority: JP
Inventors: Hiroyasu Ide; 博康井手
Original assignee: Casio Computer Co Ltd
Current assignee: Casio Computer Co Ltd
Priority date: 2007-06-12
Filing date: 2007-06-12
Publication date: 2008-12-25
Anticipated expiration: 2027-06-12
Also published as: JP5098453B2

Abstract

<P>PROBLEM TO BE SOLVED: To attain high-speed speech encoding and decoding suitable for high quality speech reproduction, by taking speech signal characteristics and auditory sense characteristics into consideration. <P>SOLUTION: A speech encoding and decoding device 111 for functioning as a speech encoding device converts input speech to a spectrum constituted of a modified discrete cosine transform (MDCT) coefficients by performing MDCT, and the time dependence of the MDCT coefficients is expressed by the difference or the ratio of each of prescribed middle section bands; and information, based on the difference or the ratio, is entropy encoded and transmitted to another speech encoding and decoding device 111 that function as the speech encoding device. When speech is encoded, a CPU 121 calculates the difference or the ratio at a certain time; and when the speech is decoded, the spectrum is restored on the basis of the difference or the ratio. Information, based on the MDCT coefficient at immediately prior to that becomes necessary for the processing by the CPU 121, and the information is stored in a storage section 125. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、聴覚の特性を考慮した音声圧縮復元を実行する際に必要となる、音声符号化装置、音声復号装置、音声符号化方法、音声復号方法、及び、プログラムに関する。 The present invention relates to a voice encoding device, a voice decoding device, a voice encoding method, a voice decoding method, and a program, which are required when executing voice compression / decompression considering auditory characteristics.

通信容量が限られている状況下でなされる音声通信においては、できるだけ少ないデータによりできるだけ高品質の声音が復元可能となるように、音声符号化及び音声復号に工夫が必要となる。 In voice communication performed under a situation where the communication capacity is limited, it is necessary to devise voice encoding and voice decoding so that voice quality with as high quality as possible can be restored with as little data as possible.

かかる工夫のひとつの方向として、人間の聴覚の特性を有効に利用することが挙げられる。 One direction of such a device is to make effective use of human auditory characteristics.

聴覚の特性を考慮した音声符号化方法としては、音声信号をスペクトルに変換した後、聴覚の特性から導かれる臨界帯域を考慮しつつ、該スペクトルを複数のサブバンドに分割する方法が知られている（例えば、特許文献１及び非特許文献１参照）。 As a speech coding method considering auditory characteristics, a method is known in which a speech signal is converted into a spectrum, and then the spectrum is divided into a plurality of subbands while considering a critical band derived from the auditory characteristics. (For example, refer to Patent Document 1 and Non-Patent Document 1).

かかる方法においては、上述のサブバンド毎に、信号値、マスキング量、雑音等が勘案され、符号化に必要なビット数が算出された後、符号化が行われる。
特開平７−４６１３７号公報 JIS規格番号JISX4323”ディジタル記録媒体のための動画信号及び付随する音響信号の1.5Mbit/s符号化−第3部音響”、p. 96［online］、［平成１８年８月７日検索］、インターネット（URL：http://www.jisc.go.jp/app/pager?id=22028） In such a method, for each subband described above, the signal value, masking amount, noise, and the like are taken into account, and after the number of bits necessary for encoding is calculated, encoding is performed.
JP 7-46137 A JIS standard number JISX4323 “1.5Mbit / s encoding of moving image signals and accompanying sound signals for digital recording media-Part 3 Sound”, p. 96 [online], [searched August 7, 2006], Internet (URL: http://www.jisc.go.jp/app/pager?id=22028)

しかし、かかる方法においては、符号化に必要なビット数の算出の手順が複雑であり、多くの計算ステップが必要となる。これは、例えばひとつには、マスキング量の算出が容易ではないためである。 However, in this method, the procedure for calculating the number of bits necessary for encoding is complicated, and many calculation steps are required. This is because, for example, it is not easy to calculate the masking amount.

よって、かかる方法を採用すると、符号化装置等の内部のＣＰＵ等の演算装置の処理負担が大きくなってしまい、処理速度の低下を招き得る。すると例えば、携帯電話等の用途において、リアルタイムで相互通話を行うのが困難になる。 Therefore, when such a method is employed, the processing load of an arithmetic device such as a CPU inside the encoding device or the like becomes large, and the processing speed may be reduced. Then, for example, in applications such as mobile phones, it becomes difficult to make a mutual call in real time.

そこで、聴覚特性を考慮しつつ高速な音声符号化及び復号処理を可能とするような、リアルタイム通話等が実用上問題のない音質で行われる符号化及び復号装置が必要とされている。 Therefore, there is a need for an encoding / decoding device that can perform high-speed audio encoding / decoding processing in consideration of auditory characteristics and that can perform real-time calls and the like with sound quality that is practically acceptable.

本発明は、上記実情に鑑みてなされたものである。すなわち、通信容量が制約されている状況において、音声符号化にあたっては、音声信号の連続性及び定常性に着目することによる符号長の短縮と、聴覚特性を考慮した帯域別信号処理による高速化と、が図られ、音声復号にあたっては、実用上問題のない品質の音声を高速で復元できるようにした、音声符号化装置、音声復号装置、音声符号化方法、音声復号方法、及び、プログラムを提供することを目的とする。 The present invention has been made in view of the above circumstances. That is, in a situation where communication capacity is limited, in speech coding, shortening the code length by paying attention to the continuity and continuity of the speech signal, and speeding up by band-based signal processing considering auditory characteristics In speech decoding, a speech encoding device, speech decoding device, speech encoding method, speech decoding method, and program that can restore speech of high quality without any practical problem at high speed are provided. The purpose is to do.

上記目的を達成するために、この発明の第１の観点に係る音声符号化装置は、
デジタル音声信号について、所定の時間区分毎に、所定の帯域幅を有する小区画帯域毎の周波数成分の値を求める離散スペクトル変換手段と、
聴覚特性に合わせて予め設定された所定の数の連続する前記小区画帯域から構成される各中区画帯域について、前記所定の時間区分毎に、該中区画帯域に属する周波数成分の値のうちの最大値を検索する最大値検索手段と、
前記最大値検索手段により検索された最大値を記憶する最大値記憶手段と、
前記最大値と該最大値による除算によって規格化された周波数成分とに基づいて生成される情報を量子化してエントロピ符号化して出力する符号化手段と、
を備え、
前記最大値検索手段は、
前記所定の時間区分毎に、該所定の時間区分において検索した最大値である現最大値を前記最大値記憶手段に記憶させるとともに、該所定の時間区分よりも時間的に過去の所定の時間区分において該最大値記憶手段に記憶させた最大値である過去最大値を該最大値記憶手段から取得し、現最大値を過去最大値に関連付けられた値に変換する、
ことを特徴とする。 In order to achieve the above object, a speech encoding apparatus according to the first aspect of the present invention provides:
Discrete spectrum conversion means for obtaining a value of a frequency component for each sub-band having a predetermined bandwidth for each predetermined time segment for the digital audio signal;
For each medium partition band composed of a predetermined number of consecutive sub-compartment bands set in advance according to auditory characteristics, for each predetermined time section, out of the frequency component values belonging to the medium-compartment band A maximum value search means for searching for a maximum value;
Maximum value storage means for storing the maximum value searched by the maximum value search means;
Encoding means for quantizing and entropy-encoding and outputting information generated based on the maximum value and a frequency component normalized by division by the maximum value;
With
The maximum value search means includes:
For each predetermined time segment, the current maximum value that is the maximum value searched in the predetermined time segment is stored in the maximum value storage means, and a predetermined time segment that is earlier in time than the predetermined time segment Obtaining the past maximum value, which is the maximum value stored in the maximum value storage means, from the maximum value storage means, and converting the current maximum value into a value associated with the past maximum value,
It is characterized by that.

音声信号の連続性及び定常性ゆえに、前記現最大値と前記過去最大値とを関連付けることにより前記情報の内容に偏りを生じさせることができる。そして、かかる偏りのある情報をエントロピ符号化するため、高い効率で符号化することができる。 Because of the continuity and continuity of the audio signal, the information content can be biased by associating the current maximum value with the past maximum value. Since such biased information is entropy-encoded, it can be encoded with high efficiency.

前記中区画帯域に低域から順に中区画識別用整数を割り当て、前記中区画帯域の中心周波数の対数が、前記中区画識別用整数に線型的に依存するように前記中区画帯域を構成する中区画帯域構成手段をさらに備える、ことが望ましい。 A medium partition identification integer is assigned to the medium partition band in order from the low range, and the medium partition band is configured such that the logarithm of the center frequency of the medium partition band linearly depends on the medium partition identification integer. It is desirable to further comprise a zone band configuration means.

人間の聴覚には、低周波音であるほど、周波数のわずかな差にも敏感であり、その感度は、周波数に対して対数的に変化する、という特性がある。よって、かかる中区画帯域構成手段をさらに備えることは、聴覚特性を考慮した音声符号化装置にふさわしい。 The human auditory sense is that the lower the frequency, the more sensitive to a slight difference in frequency, and the sensitivity changes logarithmically with frequency. Therefore, it is suitable for a speech coding apparatus considering auditory characteristics to further include such a medium zone band forming means.

前記最大値検索手段は、前記現最大値から前記過去最大値を減算した値である差分を求め、前記符号化手段は、前記差分と前記規格化された周波数成分とを量子化してエントロピ符号化して出力する、ようにしてもよい。 The maximum value search means calculates a difference that is a value obtained by subtracting the past maximum value from the current maximum value, and the encoding means quantizes the difference and the normalized frequency component to perform entropy encoding. May be output.

音声信号の連続性及び定常性ゆえに、前記現最大値自体として出現する値に比べると、前記差分として出現する値は小さい値に偏る。よって、偏りのある情報がエントロピ符号化されることになり、高い効率で符号化することができる。 Due to the continuity and continuity of the audio signal, the value appearing as the difference is biased toward a smaller value than the value appearing as the current maximum value itself. Therefore, biased information is entropy encoded, and can be encoded with high efficiency.

あるいは、前記最大値検索手段は、前記現最大値を前記過去最大値により除算した値である比率を求め、前記符号化手段は、前記比率と前記規格化された周波数成分とを量子化してエントロピ符号化して出力する、ようにしてもよい。 Alternatively, the maximum value search means obtains a ratio that is a value obtained by dividing the current maximum value by the past maximum value, and the encoding means quantizes the ratio and the normalized frequency component to entropy. You may make it encode and output.

前記比率として出現する値は1の近傍に偏るので、高い効率で符号化することができる。 Since the value appearing as the ratio is biased to the vicinity of 1, encoding can be performed with high efficiency.

最大差分決定手段をさらに備え、前記最大値検索手段は、前記現最大値から前記過去最大値を減算した値である差分を求め、前記最大差分決定手段は、前記最大値検索手段が全ての前記中区画帯域毎に求めた差分のうちの最大値である最大差分を求め、前記符号化手段は、前記最大差分と前記規格化された周波数成分とを量子化してエントロピ符号化して出力する、ようにしてもよい。 Further comprising a maximum difference determining means, wherein the maximum value searching means obtains a difference that is a value obtained by subtracting the past maximum value from the current maximum value, and the maximum difference determining means is configured such that the maximum value searching means A maximum difference that is a maximum value among the differences determined for each of the medium partition bands is obtained, and the encoding unit quantizes and entropy-encodes the maximum difference and the standardized frequency component and outputs the result. It may be.

差分に関する情報として最大差分だけが符号化されるので、符号量が少なくて済む。 Since only the maximum difference is encoded as information regarding the difference, the amount of code can be reduced.

あるいは、最大比率決定手段をさらに備え、前記最大値検索手段は、前記現最大値を前記過去最大値により除算した値である比率を求め、前記最大比率決定手段は、前記最大値検索手段が全ての前記中区画帯域毎に求めた比率のうちの最大値である最大比率を求め、前記符号化手段は、前記最大比率と前記規格化された周波数成分とを量子化してエントロピ符号化して出力する、ようにしてもよい。 Alternatively, a maximum ratio determining means is further provided, wherein the maximum value searching means obtains a ratio that is a value obtained by dividing the current maximum value by the past maximum value, and the maximum ratio determining means includes all the maximum value searching means. The maximum ratio, which is the maximum value among the ratios determined for each of the medium partition bands, is obtained, and the encoding unit quantizes and entropy-encodes the maximum ratio and the normalized frequency component and outputs the result. You may do it.

比率に関する情報として最大比率だけが符号化されるので、符号量が少なくて済む。また、音声信号のスペクトル形状は時間とともに相似性を保ちつつ変化する場合が多いため、符号化の際の精度低下が抑制される。 Since only the maximum ratio is encoded as information relating to the ratio, the amount of code can be reduced. In addition, since the spectrum shape of the audio signal often changes with time while maintaining similarity, a decrease in accuracy during encoding is suppressed.

前記離散スペクトル変換手段は、例えば、ＭＤＣＴ（Modified Discrete Cosine Transform）を用いる。 The discrete spectrum conversion means uses, for example, MDCT (Modified Discrete Cosine Transform).

上記目的を達成するために、この発明の第２の観点に係る音声復号装置は、
所定の時間区分毎の音声信号の量子化スペクトルに帯域毎の規格化を含む変形を施した結果である変形スペクトルデータと前記規格化に用いられる値である規格化用値とがエントロピ符号化されることにより生成された符号を受信する受信手段と、
前記符号から、前記エントロピ符号化に対応する復号方法により、前記所定の時間区分毎に、前記変形スペクトルデータと前記規格化用値とを復号する復号手段と、
復号された前記変形スペクトルデータから、復号された前記規格化用値を用いて、前記量子化スペクトルを前記所定の時間区分毎に復元する逆変形手段と、
前記規格化用値を記憶する規格化用値記憶手段と、
復元された前記量子化スペクトルから前記音声信号を復元する離散スペクトル逆変換手段と、
を備え、
前記逆変形手段は、
前記所定の時間区分毎に、該所定の時間区分において復号された規格化用値である現規格化用値を前記規格化用値記憶手段に記憶させるとともに、該所定の時間区分よりも時間的に過去の所定の時間区分において該規格化用値記憶手段に記憶させた規格化用値である過去規格化用値を取得し、現規格化用値と過去規格化用値とに基づいて前記量子化スペクトルを復元する、
ことを特徴とする。 In order to achieve the above object, a speech decoding apparatus according to the second aspect of the present invention provides:
Entropy coding is performed on the quantized spectrum of the audio signal for each predetermined time segment, which is a result of performing deformation including normalization for each band, and the normalization value that is a value used for the normalization. Receiving means for receiving the code generated by
Decoding means for decoding the modified spectrum data and the normalization value for each of the predetermined time intervals from the code by a decoding method corresponding to the entropy encoding;
Inverse deformation means for restoring the quantized spectrum for each of the predetermined time segments using the standardized value decoded from the deformed spectrum data decoded;
Normalization value storage means for storing the normalization value;
Discrete spectrum inverse transform means for restoring the speech signal from the restored quantized spectrum;
With
The reverse deformation means includes
For each predetermined time segment, the current standardization value, which is a standardization value decoded in the predetermined time segment, is stored in the standardization value storage means, and more time-dependent than the predetermined time segment. A past standardization value that is a standardization value stored in the standardization value storage means in a predetermined past time period, and based on the current standardization value and the past standardization value Restore the quantized spectrum,
It is characterized by that.

上記目的を達成するために、この発明の第３の観点に係る音声符号化方法は、
デジタル音声信号について、所定の時間区分毎に、所定の帯域幅を有する小区画帯域毎の周波数成分の値を求める離散スペクトル変換ステップと、
聴覚特性に合わせて予め設定された所定の数の連続する前記小区画帯域から構成される各中区画帯域について、前記所定の時間区分毎に、該中区画帯域に属する周波数成分の値のうちの最大値を検索する最大値検索ステップと、
前記最大値検索ステップにより検索された最大値を記憶する最大値記憶ステップと、
前記最大値と該最大値による除算によって規格化された周波数成分とに基づいて生成される情報を量子化してエントロピ符号化して出力する符号化ステップと、
から構成され、
前記最大値検索ステップは、
前記所定の時間区分毎に、該所定の時間区分において検索した最大値である現最大値が前記最大値記憶ステップにより記憶されるときに、該所定の時間区分よりも時間的に過去の所定の時間区分において過去の最大値記憶ステップで記憶した最大値である過去最大値を取得し、現最大値を過去最大値に関連付けられた値に変換する、
ことを特徴とする。 In order to achieve the above object, a speech encoding method according to a third aspect of the present invention includes:
For a digital audio signal, for each predetermined time segment, a discrete spectrum conversion step for obtaining a value of a frequency component for each sub-compartment band having a predetermined bandwidth;
For each medium partition band composed of a predetermined number of consecutive sub-compartment bands set in advance according to auditory characteristics, for each predetermined time section, out of the frequency component values belonging to the medium-compartment band A maximum value search step for searching for a maximum value;
A maximum value storing step for storing the maximum value searched by the maximum value searching step;
An encoding step of quantizing and entropy-encoding and outputting information generated based on the maximum value and a frequency component normalized by division by the maximum value;
Consisting of
The maximum value search step includes:
For each predetermined time segment, when the current maximum value, which is the maximum value searched in the predetermined time segment, is stored by the maximum value storing step, a predetermined past in time than the predetermined time segment is stored. Obtain the past maximum value that is the maximum value stored in the past maximum value storage step in the time segment, and convert the current maximum value to a value associated with the past maximum value,
It is characterized by that.

上記目的を達成するために、この発明の第４の観点に係る音声復号方法は、
所定の時間区分毎の音声信号の量子化スペクトルに帯域毎の規格化を含む変形を施した結果である変形スペクトルデータと前記規格化に用いられる値である規格化用値とがエントロピ符号化されることにより生成された符号を受信する受信ステップと、
前記符号から、前記エントロピ符号化に対応する復号方法により、前記所定の時間区分毎に、前記変形スペクトルデータと前記規格化用値とを復号する復号ステップと、
復号された前記変形スペクトルデータから、復号された前記規格化用値を用いて、前記量子化スペクトルを前記所定の時間区分毎に復元する逆変形ステップと、
前記規格化用値を記憶する規格化用値記憶ステップと、
復元された前記量子化スペクトルから前記音声信号を復元する離散スペクトル逆変換ステップと、
から構成され、
前記逆変形ステップは、
前記所定の時間区分毎に、該所定の時間区分において復号された規格化用値である現規格化用値が前記規格化用値記憶ステップにより記憶されるときに、該所定の時間区分よりも時間的に過去の所定の時間区分において過去の規格化用値記憶ステップで記憶した規格化用値である過去規格化用値を取得し、現規格化用値と過去規格化用値とに基づいて前記量子化スペクトルを復元する、
ことを特徴とする。 In order to achieve the above object, a speech decoding method according to the fourth aspect of the present invention provides:
Entropy coding is performed on the quantized spectrum of the audio signal for each predetermined time segment, which is a result of performing deformation including normalization for each band, and the normalization value that is a value used for the normalization. A receiving step for receiving a code generated by
A decoding step of decoding the modified spectrum data and the normalization value for each of the predetermined time intervals from the code by a decoding method corresponding to the entropy encoding;
An inverse transformation step of restoring the quantized spectrum for each predetermined time segment using the standardized value decoded from the decoded spectral data,
A normalization value storing step for storing the normalization value;
A discrete spectrum inverse transform step of restoring the speech signal from the restored quantized spectrum;
Consisting of
The reverse deformation step includes
For each predetermined time segment, when the current standardization value, which is a standardization value decoded in the predetermined time segment, is stored by the standardization value storage step, A past standardization value that is a standardization value stored in the past standardization value storage step in a predetermined time segment in the past is acquired, and based on the current standardization value and the past standardization value To restore the quantized spectrum,
It is characterized by that.

上記目的を達成するために、この発明の第５の観点に係るプログラムは、
コンピュータに、
デジタル音声信号について、所定の時間区分毎に、所定の帯域幅を有する小区画帯域毎の周波数成分の値を求める離散スペクトル変換ステップと、
聴覚特性に合わせて予め設定された所定の数の連続する前記小区画帯域から構成される各中区画帯域について、前記所定の時間区分毎に、該中区画帯域に属する周波数成分の値のうちの最大値を検索する最大値検索ステップと、
前記最大値検索ステップにより検索された最大値を記憶する最大値記憶ステップと、
前記最大値と該最大値による除算によって規格化された周波数成分とに基づいて生成される情報を量子化してエントロピ符号化して出力する符号化ステップと、
を実行させるプログラムであって、
前記最大値検索ステップは、
前記所定の時間区分毎に、該所定の時間区分において検索した最大値である現最大値が前記最大値記憶ステップにより記憶されるときに、該所定の時間区分よりも時間的に過去の所定の時間区分において過去の最大値記憶ステップで記憶した最大値である過去最大値を取得し、現最大値を過去最大値に関連付けられた値に変換する、
ことを特徴とする。 In order to achieve the above object, a program according to the fifth aspect of the present invention provides:
On the computer,
For a digital audio signal, for each predetermined time segment, a discrete spectrum conversion step for obtaining a value of a frequency component for each sub-compartment band having a predetermined bandwidth;
For each medium partition band composed of a predetermined number of consecutive sub-compartment bands set in advance according to auditory characteristics, for each predetermined time section, out of the frequency component values belonging to the medium-compartment band A maximum value search step for searching for a maximum value;
A maximum value storing step for storing the maximum value searched by the maximum value searching step;
An encoding step of quantizing and entropy-encoding and outputting information generated based on the maximum value and a frequency component normalized by division by the maximum value;
A program for executing
The maximum value search step includes:
For each predetermined time segment, when the current maximum value, which is the maximum value searched in the predetermined time segment, is stored by the maximum value storing step, a predetermined past in time than the predetermined time segment is stored. Obtain the past maximum value that is the maximum value stored in the past maximum value storage step in the time segment, and convert the current maximum value to a value associated with the past maximum value,
It is characterized by that.

上記目的を達成するために、この発明の第６の観点に係るプログラムは、
コンピュータに、
所定の時間区分毎の音声信号の量子化スペクトルに帯域毎の規格化を含む変形を施した結果である変形スペクトルデータと前記規格化に用いられる値である規格化用値とがエントロピ符号化されることにより生成された符号を受信する受信ステップと、
前記符号から、前記エントロピ符号化に対応する復号方法により、前記所定の時間区分毎に、前記変形スペクトルデータと前記規格化用値とを復号する復号ステップと、
復号された前記変形スペクトルデータから、復号された前記規格化用値を用いて、前記量子化スペクトルを前記所定の時間区分毎に復元する逆変形ステップと、
前記規格化用値を記憶する規格化用値記憶ステップと、
復元された前記量子化スペクトルから前記音声信号を復元する離散スペクトル逆変換ステップと、
を実行させるプログラムであって、
前記逆変形ステップは、
前記所定の時間区分毎に、該所定の時間区分において復号された規格化用値である現規格化用値が前記規格化用値記憶ステップにより記憶されるときに、該所定の時間区分よりも時間的に過去の所定の時間区分において過去の規格化用値記憶ステップで記憶した規格化用値である過去規格化用値を取得し、現規格化用値と過去規格化用値とに基づいて前記量子化スペクトルを復元する、
ことを特徴とする。 In order to achieve the above object, a program according to the sixth aspect of the present invention provides:
On the computer,
Entropy coding is performed on the quantized spectrum of the audio signal for each predetermined time segment, which is a result of performing deformation including normalization for each band, and the normalization value that is a value used for the normalization. A receiving step for receiving a code generated by
A decoding step of decoding the modified spectrum data and the normalization value for each of the predetermined time intervals from the code by a decoding method corresponding to the entropy encoding;
An inverse transformation step of restoring the quantized spectrum for each predetermined time segment using the standardized value decoded from the decoded spectral data,
A normalization value storing step for storing the normalization value;
A discrete spectrum inverse transform step of restoring the speech signal from the restored quantized spectrum;
A program for executing
The reverse deformation step includes
For each predetermined time segment, when the current standardization value, which is a standardization value decoded in the predetermined time segment, is stored by the standardization value storage step, A past standardization value that is a standardization value stored in the past standardization value storage step in a predetermined time segment in the past is acquired, and based on the current standardization value and the past standardization value To restore the quantized spectrum,
It is characterized by that.

本発明によれば、音声信号の特性と聴覚の特性とが考慮された上で音声信号が帯域毎に処理される。よって、高音質を確保しつつも、音声信号を高速かつ軽快に符号化及び復号することが可能となる。 According to the present invention, the audio signal is processed for each band in consideration of the characteristics of the audio signal and the auditory characteristics. Therefore, it is possible to encode and decode an audio signal at high speed and lightly while ensuring high sound quality.

以下、本発明の実施の形態に係る音声符号化装置及び音声復号装置について詳細に説明する。 The speech encoding apparatus and speech decoding apparatus according to embodiments of the present invention will be described in detail below.

なお、ユーザにとっての利便性を確保する観点から、音声符号化装置と音声復号装置とは、音声符号化兼復号装置として、単一の装置に統合されているものとする。 From the viewpoint of ensuring convenience for the user, the speech encoding device and speech decoding device are integrated into a single device as a speech encoding / decoding device.

（実施形態１）
図１に、本実施形態に係る音声符号化兼復号装置１１１を示す。該装置としては、例えば、携帯電話機が想定される。 (Embodiment 1)
FIG. 1 shows a speech encoding / decoding device 111 according to this embodiment. As the device, for example, a mobile phone is assumed.

音声符号化兼復号装置１１１は、ＣＰＵ１２１と、ＲＯＭ（Read Only Memory）１２３と、記憶部１２５と、音声処理部１４１と、無線通信部１６１と、操作キー入力内容処理部１７１と、を備え、これらは、システムバス１８１で相互に接続されている。システムバス１８１は、命令やデータを転送するための伝送経路である。 The speech encoding / decoding device 111 includes a CPU 121, a ROM (Read Only Memory) 123, a storage unit 125, a speech processing unit 141, a wireless communication unit 161, and an operation key input content processing unit 171. These are connected to each other via a system bus 181. The system bus 181 is a transmission path for transferring commands and data.

ＲＯＭ１２３には、音声符号化及び復号のための動作プログラムが格納されている。 The ROM 123 stores an operation program for voice encoding and decoding.

記憶部１２５は、ＲＡＭ（Random Access Memory）１３１と、ハードディスク１３３と、から構成されて、デジタル音声信号、ＭＤＣＴ係数、帯域毎のＭＤＣＴ係数の最大値、所定の時間間隔毎の該最大値の変化量等を記憶する。特に本実施形態においては、音声符号化兼復号装置１１１は、音声符号化と音声復号のいずれの場合にも、ある時刻における処理に際して直前の時刻の音声信号に基づく情報を必要とするので、記憶部１２５は、かかる情報を一時的にせよ格納しておく遅延処理用バッファメモリとして、重要な役割を果たす。 The storage unit 125 includes a RAM (Random Access Memory) 131 and a hard disk 133. The digital audio signal, the MDCT coefficient, the maximum value of the MDCT coefficient for each band, and the change in the maximum value for each predetermined time interval. Remember quantity etc. In particular, in this embodiment, the speech encoding / decoding device 111 requires information based on the speech signal of the immediately preceding time for processing at a certain time in both speech encoding and speech decoding. The unit 125 plays an important role as a delay processing buffer memory for temporarily storing such information.

音声符号化兼復号装置１１１は、マイクロフォン１５１と、スピーカ１５３と、アンテナ１６３と、操作キー１７３と、をさらに備える。 The audio encoding / decoding device 111 further includes a microphone 151, a speaker 153, an antenna 163, and operation keys 173.

マイクロフォン１５１は、送信側すなわち符号化側のユーザの音声を収集し、音声処理部１４１に引き渡す。スピーカ１５３は、音声処理部１４１から引き渡された復元音声を、受信側すなわち復号側のユーザに対して、発する。アンテナ１６３は、無線通信部１６１から無線信号として引き渡された符号を受信側すなわち復号側の音声符号化兼復号装置１１１に送信したり、送信側すなわち符号化側の音声符号化兼復号装置１１１から送信された無線信号を受信して無線通信部１６１に引き渡したりする。操作キー１７３は、あらかじめ与えられている初期設定値、例えば、信号処理のための各種帯域の境界周波数を、ユーザ自らの判断で変更するときや、送信側すなわち符号化側のユーザが、通話の相手方である受信側及び復号側の装置１１１を特定したりするときに、ユーザの意図を装置１１１に伝達するために用いられる。 The microphone 151 collects the user's voice on the transmission side, that is, the encoding side, and delivers it to the voice processing unit 141. The speaker 153 issues the restored voice delivered from the voice processing unit 141 to the user on the receiving side, that is, the decoding side. The antenna 163 transmits the code delivered as a radio signal from the wireless communication unit 161 to the speech encoding / decoding device 111 on the reception side, that is, the decoding side, or from the speech encoding / decoding device 111 on the transmission side, that is, the encoding side. The transmitted wireless signal is received and delivered to the wireless communication unit 161. The operation key 173 is used to change an initial set value given in advance, for example, a boundary frequency of various bands for signal processing by the user's own judgment, or when the user on the transmission side, that is, the encoding side, This is used to transmit the user's intention to the device 111 when specifying the receiving-side and decoding-side devices 111 as counterparts.

音声処理部１４１、無線通信部１６１、操作キー入力内容処理部１７１は、システムバス１８１を介してＣＰＵ１２１の制御下にある。 The voice processing unit 141, the wireless communication unit 161, and the operation key input content processing unit 171 are under the control of the CPU 121 via the system bus 181.

マイクロフォン１５１に入力された音声は、音声処理部１４１の内部のＡ／Ｄコンバータ（図示せず）による、例えば16kHzサンプリング及び16ビット量子化により、デジタル音声信号に変換される。 The sound input to the microphone 151 is converted into a digital sound signal by, for example, 16 kHz sampling and 16-bit quantization by an A / D converter (not shown) inside the sound processing unit 141.

かかるデジタル音声信号は、音声処理部１４１により、音声信号圧縮の基本的な処理単位であるフレームに時分割されつつ、順次、記憶部１２５に送られる。 The digital audio signal is sequentially sent to the storage unit 125 while being time-divided into frames, which are basic processing units of audio signal compression, by the audio processing unit 141.

後述のとおり、ひとつのフレームのデジタル音声信号はひとまとまりのものとして、記憶部１２５への格納、ＣＰＵ１２１による周波数領域への変換、無線通信部１６１への伝達、アンテナ１６３による無線送信、といった過程を経る。 As will be described later, a digital audio signal of one frame is regarded as a group, and is stored in the storage unit 125, converted into a frequency domain by the CPU 121, transmitted to the wireless communication unit 161, and wirelessly transmitted by the antenna 163. It passes.

例えば、記憶部１２５に存在するあるフレームの信号が、ＣＰＵ１２１による処理を施されて無線通信部１６１へ伝達され終わったとする。すると、記憶部１２５からは、該フレームの信号に関わるデータは記憶部１２５から削除される。そして、記憶部１２５には、音声処理部１４１から次のフレームの信号が引き渡される。 For example, it is assumed that a signal of a certain frame existing in the storage unit 125 has been processed by the CPU 121 and has been transmitted to the wireless communication unit 161. Then, the data related to the signal of the frame is deleted from the storage unit 125 from the storage unit 125. Then, the signal of the next frame is delivered from the audio processing unit 141 to the storage unit 125.

このように、音声信号が入力され続ける限り、空きプロセスが生じることなく、次から次へとフレーム単位での信号処理が進行する。このような鎖状の処理方法を採ることにより、携帯電話として必要な、音声信号のリアルタイム処理が可能になる。 As described above, as long as the audio signal is continuously input, the signal processing is performed in units of frames from one to the next without generating an empty process. By adopting such a chain processing method, it is possible to perform real-time processing of an audio signal necessary for a mobile phone.

ただし、フレームは上述のとおりあくまでも基本的な処理単位である。本実施形態においては、後述するように、１フレーム毎の処理に加えて、時間軸上で隣接する２フレームのデジタル音声信号の相違に着目した処理が実行されるので、この意味では、２フレームが基本的な処理単位となる。 However, the frame is a basic processing unit as described above. In this embodiment, as will be described later, in addition to the processing for each frame, processing focusing on the difference between the digital audio signals of two adjacent frames on the time axis is executed. Is the basic processing unit.

以下では、理解を容易にするために、まず、音声が、ある時刻tに対応する1フレーム分の時間に渡ってのみ、マイクロフォン１５１に入力されたと仮定して、説明する。 In the following, for ease of understanding, first, it is assumed that the voice is input to the microphone 151 only over a time corresponding to one frame corresponding to a certain time t.

１フレームがM個の信号値から構成されるとして、マイクロフォン１５１に入力された音声信号が、音声処理部１４１によってデジタル音声信号x₀、・・・、x_M-1に変換され記憶部１２５に引き渡されたとする。装置１１１内部の各構成要素間のデータ移動は、ＣＰＵ１２１の指示に従ってシステムバス１８１を用いて行われる。ＣＰＵ１２１の指示は、ＲＯＭ１２３に格納された動作プログラムに従って発せられる。 Assuming that one frame is composed of M signal values, an audio signal input to the microphone 151 is converted into a digital audio signal x ₀ ,..., X _M−1 by the audio processing unit 141 and stored in the storage unit 125. Suppose it was delivered. Data movement between components in the apparatus 111 is performed using the system bus 181 in accordance with instructions from the CPU 121. An instruction from the CPU 121 is issued according to an operation program stored in the ROM 123.

記憶部１２５に格納されたデジタル音声信号x₀、・・・、x_M-1は、ＣＰＵ１２１の汎用レジスタ（図示せず。）のひとつにロードされる。実時間領域の信号であるデジタル音声信号x₀、・・・、x_M-1は、ＣＰＵ１２１により周波数領域の信号X₀、・・・、X_M/2-1に変換され、汎用レジスタに格納される。変換方法は、実時間領域の信号を周波数領域の信号へと変換するものであれば任意の方法であってよいが、変換後の数値に虚部が生じないため扱いが容易となることから、変形離散コサイン変換（ＭＤＣＴ、Modified Discrete Cosine Transform）を採用するのが好適である。 Digital audio signals x ₀ ,..., X _M−1 stored in the storage unit 125 are loaded into one of general-purpose registers (not shown) of the CPU 121. The digital audio signals x ₀ ,..., X _{M−1 that} are real time domain signals are converted into frequency domain signals X ₀ ,..., X _{M / 2-1} by the CPU 121 and stored in general-purpose registers. Is done. The conversion method may be any method as long as it converts a signal in the real time domain to a signal in the frequency domain, but since the imaginary part does not occur in the converted numerical value, the handling becomes easy. It is preferable to employ a modified discrete cosine transform (MDCT).

なお、実時間領域のM個の信号値が、上述のように周波数領域ではM/2個の周波数変換係数値に対応するのは、周波数変換にＭＤＣＴを用いたからである。他の周波数変換方法の場合、実時間領域におけるデータ数と周波数領域におけるデータ数とが、2：1の比になるとは限らないが、その場合は、周波数係数の最終値に付された数字を適宜読み替えれば、以下の説明はそのままあてはまる。 The reason why M signal values in the real time domain correspond to M / 2 frequency conversion coefficient values in the frequency domain as described above is because MDCT is used for frequency conversion. In the case of other frequency conversion methods, the number of data in the real time domain and the number of data in the frequency domain do not always have a 2: 1 ratio, but in that case, the number attached to the final value of the frequency coefficient is The following description is applied as it is when read appropriately.

図２（ａ）は、こうして生成されたＭＤＣＴ係数を模式的に表したものである。図２（ｂ）は、その一部を拡大したものである。ＭＤＣＴは離散的周波数変換の一種であるから、周波数軸を区切ることによって生じるM/2個の小区画帯域毎に、ひとつの周波数変換係数が割り当てられることになる。図に示すように、低周波数側から数えてk+1番目の小区画帯域には番号kが与えられ、周波数変換係数X_kが割り当てられる（ただし、0≦k≦M/2-1である。）。X_kはＭＤＣＴ係数と呼ばれる。 FIG. 2A schematically shows the MDCT coefficient thus generated. FIG. 2B is an enlarged view of a part thereof. Since MDCT is a kind of discrete frequency conversion, one frequency conversion coefficient is assigned to each of the M / 2 sub-compartment bands generated by dividing the frequency axis. As shown in the figure, the number _k is assigned to the (k + 1) th sub-band from the low frequency side, and the frequency conversion coefficient X _k is assigned (where 0 ≦ k ≦ M / 2-1). .) X _k is called an MDCT coefficient.

有限の時間長を有する時間区画１つにつき１回のＭＤＣＴを行う。かかる時間区画をＭＤＣＴブロックと呼ぶ。また、ＭＤＣＴブロックひとつに含まれる信号サンプルの数をＭＤＣＴの次数という。ＭＤＣＴの次数としては、例えば５１２が好適である。 One MDCT is performed per time section having a finite time length. Such a time segment is called an MDCT block. The number of signal samples included in one MDCT block is referred to as the MDCT order. For example, 512 is preferable as the order of MDCT.

フレームは音声圧縮の処理単位であるから、基本的には、ＭＤＣＴブロックの時間長は1個のフレームの時間長を超えてはならない。一方、1個のフレームは複数のＭＤＣＴブロックを含んでもよく、例えば、1個のフレームが4個のＭＤＣＴブロックを含むのが好適である。 Basically, the time length of the MDCT block must not exceed the time length of one frame because the frame is a processing unit of audio compression. On the other hand, one frame may include a plurality of MDCT blocks. For example, it is preferable that one frame includes four MDCT blocks.

ただし、ここでは、発明の本質のみを抽出することにより理解を容易にするために、フレーム1個がＭＤＣＴブロック1個と１対１の対応をしているとする。つまり、1個のフレームがそのまま1個のＭＤＣＴブロックに対応しているとする。すると、図２以降のＭＤＣＴ係数の模式図においては、フレーム１個にM個の実時間信号値が含まれていることから、ＭＤＣＴの次数はMであることになる。 However, here, in order to facilitate understanding by extracting only the essence of the invention, it is assumed that one frame has a one-to-one correspondence with one MDCT block. In other words, it is assumed that one frame corresponds to one MDCT block as it is. Then, in the schematic diagrams of the MDCT coefficients in FIG. 2 and subsequent figures, since M real-time signal values are included in one frame, the order of MDCT is M.

なお、図２以降では、ＭＤＣＴ係数は全て正の値をとるかのように描かれているが、これは理解を容易にするためにすぎない。実際のＭＤＣＴ係数は負の値をとる場合もある。かかる場合には、符号を表すためのビットを設ける等、任意の既知の手法を用いればよい。上述のように、図２以降のＭＤＣＴ係数に関する図は、あくまでも説明のための模式図である。 In FIG. 2 and subsequent figures, all MDCT coefficients are drawn as if they were positive values, but this is only for easy understanding. The actual MDCT coefficient may take a negative value. In such a case, any known method such as providing a bit for representing a code may be used. As described above, the drawings relating to the MDCT coefficients in FIG. 2 and subsequent figures are schematic diagrams for explanation only.

ＣＰＵ１２１は、汎用レジスタに格納されているＭＤＣＴ係数X_k（0≦k≦M/2-1）について、後の処理を円滑に行うために、各ＭＤＣＴ係数を識別するための記号を付け替える。該付け替えは、ＣＰＵ１２１が、ＲＯＭ１２３から読み出した動作プログラムに従って行う。具体的には、次のように、各ＭＤＣＴ係数を、時刻tの他に、２個の記号で識別し直す。 For the MDCT coefficient X _k (0 ≦ k ≦ M / 2-1) stored in the general-purpose register, the CPU 121 changes the symbol for identifying each MDCT coefficient in order to perform subsequent processing smoothly. The replacement is performed by the CPU 121 according to the operation program read from the ROM 123. Specifically, each MDCT coefficient is re-identified with two symbols in addition to time t as follows.

まず、図３（ａ）に示すように、周波数領域全体を、ω_MaxRANGE個の中区画帯域に分割し、低周波数側から1、2、・・・、ω_MaxRANGEのように番号を付けて各帯域を区別する。 First, as shown in FIG. 3 (a), the entire frequency region is divided into ω _MaxRANGE medium partition bands, and numbers such as 1, 2,..., Ω _MaxRANGE are assigned from the low frequency side. Distinguish between bands.

ＭＤＣＴ係数の識別のための新たな記号のひとつは、この番号である。 One of the new symbols for identifying MDCT coefficients is this number.

各中区画帯域の中心周波数の対数が、該番号に線型的に依存するように、中区画帯域による周波数領域の分割が行われる。換言すると、ＣＰＵ１２１がＲＯＭ１２３から読み出す動作プログラムには、かかる分割が行われるような命令が含まれている。かかる分割によれば、高周波領域の中区画帯域ほど、帯域幅が広くなる。図３（ａ）にはその様子が模式的に示されている。 The frequency domain is divided by the medium partition band so that the logarithm of the center frequency of each medium partition band linearly depends on the number. In other words, the operation program read from the ROM 123 by the CPU 121 includes an instruction for performing such division. According to such division, the bandwidth becomes wider as the middle zone band of the high frequency region. FIG. 3A schematically shows the state.

このように対数を基準に分割を行う理由は、人間の聴覚における周波数の違いに対する感度が、高周波成分ほど対数的に鈍くなるためである。そこで、限られた通信容量でできる限り有効な音声信号伝達を行うためには、低周波成分については、再生音質の確保のために詳しく再現し得るようにする一方で、高周波成分については、おおまかな情報だけ伝達することにして、情報量が全体として少なくなるようにするのが適切である。 The reason for performing the division based on the logarithm is that the sensitivity to the frequency difference in human hearing is logarithmically lower as the high frequency component. Therefore, in order to transmit audio signals as effectively as possible with limited communication capacity, low-frequency components can be reproduced in detail to ensure playback sound quality, while high-frequency components are roughly It is appropriate to transmit only the correct information so that the total amount of information is reduced.

例えば、マイクロフォン１５１に入力された音声が音声処理部１４１においてサンプリング周波数16kHzでデジタル信号に変換された場合には、ＲＯＭ１２３に格納されている動作プログラムにおいて、中区画帯域を11個設けることとし、中区画帯域の境界を187.5Hz、437.5Hz、687.5Hz、937.5Hz、1312.5Hz、1687.5Hz、2312.5Hz、3250Hz、4625Hz、6500Hz、のように設定しておくのが好適である。 For example, when the sound input to the microphone 151 is converted into a digital signal at a sampling frequency of 16 kHz in the sound processing unit 141, 11 medium partition bands are provided in the operation program stored in the ROM 123. It is preferable to set the boundary of the partition band as 187.5 Hz, 437.5 Hz, 687.5 Hz, 937.5 Hz, 1312.5 Hz, 1687.5 Hz, 2312.5 Hz, 3250 Hz, 4625 Hz, 6500 Hz.

次に、各ＭＤＣＴ係数が、その属する中区画帯域のうち低周波側から数えて何番目のものであるか、が決定される。ω_RANGE（1≦ω_RANGE≦ω_MaxRANGE）という番号が付された中区画帯域にはq(ω_RANGE)個のＭＤＣＴ係数が含まれるとする。 Next, it is determined what number each MDCT coefficient is counted from the low frequency side of the medium partition band to which the MDCT coefficient belongs. It is assumed that q (ω _RANGE ) MDCT coefficients are included in the middle partition band numbered ω _RANGE (1 ≦ ω _RANGE ≦ ω _MaxRANGE ).

すると、ＭＤＣＴ係数は、どの中区画帯域に属するかということと、中区画帯域の中で低周波数側から数えて何番目の係数であるかということと、を表す２つの記号により、特定される。すなわち、これまで図２（ｂ）に示すように全周波数に渡って1乃至M/2-1という番号で区別されていたＭＤＣＴ係数は、新たに、時刻tにおける、ω_RANGE番目の中区画帯域（1≦ω_RANGE≦ω_MaxRANGE)に属する、X(ω_RANGE、1、t)、・・・、X(ω_RANGE、ｑ(ω_RANGE)、t)、という形で記述されることにより、相互に区別されることになる。この様子は、図３（ａ）の一部を拡大した図３（ｂ）により、示される。 Then, the MDCT coefficient is specified by two symbols indicating which medium partition band it belongs to and what number coefficient is counted from the low frequency side in the medium partition band. . That is, as shown in FIG. 2B, the MDCT coefficients that have been distinguished by the numbers 1 to M / 2-1 over the entire frequency so far are newly added to the ω _RANGE middle partition band at time t. (1 ≦ ω _RANGE ≦ ω _MaxRANGE ), X (ω _RANGE , 1, t), ..., X (ω _RANGE , q (ω _RANGE ), t) Will be distinguished. This state is shown in FIG. 3 (b) in which a part of FIG. 3 (a) is enlarged.

ＣＰＵ１２１は、こうして識別し直されたＭＤＣＴ係数X(ω_RANGE、1、t)、・・・、X(ω_RANGE、ｑ(ω_RANGE)、t)（1≦ω_RANGE≦ω_MaxRANGE）を、記憶部１２５に格納する。 The CPU 121 stores the MDCT coefficients X (ω _RANGE , 1, t),..., X (ω _RANGE , q (ω _RANGE ), t) (1 ≦ ω _RANGE ≦ ω _MaxRANGE ) thus re-identified. Stored in the unit 125.

また、時刻tにおいて、ω_RANGEで表される中区画帯域におけるＭＤＣＴ係数の最大値を、中区画帯域最大値X_MAX(ω_RANGE、t)とする。 Further, at time t, the maximum value of the MDCT coefficient in the medium partition band represented by ω _RANGE is set to the medium partition band maximum value X _MAX (ω _RANGE , t).

以下では理解を容易にするために、図２及び図３に示したグラフの縦軸方向の分解能すなわちデジタル化のために割り当てられたビット数は、全ての中区画帯域で一定であるとするが、帯域毎に異なるビット数をあらかじめ定めておいてもよい。例えば、連続する複数の中区画帯域をまとめた大区画帯域を定義した上で、ＭＤＣＴ係数を取り扱うに際しての精度を大区画帯域毎にあらかじめ決定しておくことととし、かつ、聴覚特性を考慮して、低周波数側の大区画帯域ほど該精度を高くすることとしてもよい。聴覚には、低周波音であるほど、音量の大小に敏感であるという特性があるからである。また、以下では差分の計算等に際してＭＤＣＴ係数をそのまま用いるが、ＭＤＣＴ係数の対数をとってから各種処理を実行し、最終段階でかかる対数から元のＭＤＣＴ係数に戻すように取り扱ってもよい。 In the following, for ease of understanding, it is assumed that the resolution in the vertical axis direction of the graphs shown in FIGS. 2 and 3, that is, the number of bits allocated for digitization, is constant in all the medium partition bands. Alternatively, a different number of bits may be determined for each band. For example, after defining a large block band that is a collection of a plurality of continuous medium block bands, the accuracy in handling the MDCT coefficient is determined in advance for each large block band, and the auditory characteristics are taken into account. Thus, the accuracy may be increased as the large frequency band on the lower frequency side. This is because hearing has a characteristic that the lower the frequency, the more sensitive the volume. In the following description, the MDCT coefficient is used as it is when calculating the difference. However, various processes may be executed after taking the logarithm of the MDCT coefficient, and the logarithm of the MDCT coefficient may be returned to the original MDCT coefficient at the final stage.

本実施形態においては、音声符号化兼復号装置１１１は、ある時刻tにおけるＭＤＣＴ係数を授受するにあたって、該時刻より時間Δtだけ前の時刻におけるＭＤＣＴ係数を利用する。理解を容易にするために、まず、図４〜図６を参照しつつ、音声符号化側の音声符号化兼復号装置１１１と音声復号側の音声符号化兼復号装置１１１とが行う演算及び両者間で授受される情報について、概略的に述べる。その後、より詳細な処理の流れを、図７以降のフローチャートを参照しつつ説明する。 In the present embodiment, the speech encoding / decoding device 111 uses the MDCT coefficient at a time that is a time Δt before the time when sending and receiving the MDCT coefficient at a certain time t. In order to facilitate understanding, first, referring to FIGS. 4 to 6, operations performed by the speech encoding / decoding device 111 on the speech encoding side and the speech encoding / decoding device 111 on the speech decoding side, and both Outline the information exchanged between the two. Thereafter, a more detailed processing flow will be described with reference to the flowcharts in FIG.

本実施形態の特徴は、時刻t-Δtと時刻tとの間のスペクトルの変化に基づく情報つまり差分が授受される点にある。したがって前提として、音声符号化側の音声符号化兼復号装置１１１から音声復号側の音声符号化兼復号装置１１１への通信開始時には、初期値として必要なＭＤＣＴ係数が、任意の既知の手法により、前者の装置から後者の装置に伝達されるものとする。また、通信が長時間に及ぶ場合には、差分が積算されることによる誤差を無視することができなくなることもあり得る。これに対処するために、リフレッシュレートをあらかじめ定めておき、一定の頻度で通信開始時と同様の初期化処理を行うようにしてもよい。以下では、本実施形態における特徴的な処理である、差分の授受についてのみ説明する。 The feature of this embodiment is that information based on a change in spectrum between time t-Δt and time t, that is, a difference is exchanged. Therefore, as a premise, at the start of communication from the speech encoding / decoding device 111 on the speech encoding side to the speech encoding / decoding device 111 on the speech decoding side, the MDCT coefficient necessary as an initial value is determined by any known method. It is assumed that the former device is transmitted to the latter device. In addition, when communication takes a long time, it may be impossible to ignore an error caused by difference accumulation. In order to deal with this, a refresh rate may be determined in advance, and initialization processing similar to that at the start of communication may be performed at a certain frequency. Hereinafter, only difference transmission / reception, which is a characteristic process in the present embodiment, will be described.

図４〜図６では、左側に音声符号化側の音声符号化兼復号装置１１１が、右側に音声復号側の音声符号化兼復号装置１１１が描かれている。以下、それぞれの装置を単に送信機、受信機と呼ぶ。なお、図が煩雑にならないように、図４〜図６では、図１に示した音声符号化兼復号装置１１１の構成要素のうち、記憶部１２５及びアンテナ１６３以外は省略してある。 4 to 6, the speech encoding / decoding device 111 on the speech encoding side is depicted on the left side, and the speech encoding / decoding device 111 on the speech decoding side is depicted on the right side. Hereinafter, each device is simply referred to as a transmitter and a receiver. 4 to 6, components other than the storage unit 125 and the antenna 163 are omitted from the components of the speech encoding / decoding device 111 illustrated in FIG.

はじめに、図４（ａ）に示すように、送信機及び受信機いずれの記憶部１２５にも、時刻t-Δtにおける中区画帯域内のＭＤＣＴ係数の最大値X_MAX(ω_RANGE、t-Δt)が格納されている。時刻がtになると、送信機のＣＰＵ１２１は、時刻tにおけるＭＤＣＴ係数を算出し、送信機の記憶部１２５に格納する（図４（ａ）参照）。 First, as shown in FIG. 4A, the maximum value X _MAX (ω _RANGE , t-Δt) of the MDCT coefficient in the middle zone band at time t-Δt is stored in both the storage unit 125 of the transmitter and the receiver. Is stored. When the time reaches t, the CPU 121 of the transmitter calculates the MDCT coefficient at the time t and stores it in the storage unit 125 of the transmitter (see FIG. 4A).

続いて、送信機のＣＰＵ１２１は、中区画帯域内における検索を行い、時刻tにおける中区画帯域内のＭＤＣＴ係数の最大値X_MAX(ω_RANGE、t)を算出し、記憶部１２５に格納する。この時点で、図４（ｂ）に示すように、送信機の記憶部１２５には、時刻t-Δt及びtにおける中区画帯域内のＭＤＣＴ係数の最大値と、時刻tにおけるＭＤＣＴ係数と、が格納されている。受信機の記憶部１２５には、時刻t-Δtにおける中区画帯域内のＭＤＣＴ係数の最大値X_MAX(ω_RANGE、t-Δt)が格納されたまま、変化はない。 Subsequently, the CPU 121 of the transmitter performs a search in the middle partition band, calculates a maximum value X _MAX (ω _RANGE , t) of the MDCT coefficient in the middle partition band at time t, and stores it in the storage unit 125. At this time, as shown in FIG. 4 (b), the storage unit 125 of the transmitter has the maximum value of the MDCT coefficient in the middle zone band at time t-Δt and t, and the MDCT coefficient at time t. Stored. In the storage unit 125 of the receiver, the maximum value X _MAX (ω _RANGE , t−Δt) of the MDCT coefficient in the middle partition band at the time t−Δt is stored, and there is no change.

送信機のＣＰＵ１２１は、送信機の記憶部１２５に格納されている時刻t-Δtにおける中区画帯域内のＭＤＣＴ係数の最大値X_MAX(ω_RANGE、t-Δt)を、同じく記憶部１２５に格納されている時刻tにおける中区画帯域内のＭＤＣＴ係数の最大値X_MAX(ω_RANGE、t)から減算することにより、時刻tにおける最大値の差分値を求め、記憶部１２５に格納する。この後は送信機においては時刻t-Δtにおける中区画帯域内のＭＤＣＴ係数の最大値X_MAX(ω_RANGE、t-Δt)は不要である。よって、送信機の記憶部１２５の記憶容量が圧迫されないよう、消去してもよい。送信機のＣＰＵ１２１はさらに、送信機の記憶部１２５に格納されている時刻tにおけるＭＤＣＴ係数を、同じく記憶部１２５に格納されている時刻tにおける中区画帯域内のＭＤＣＴ係数の最大値X_MAX(ω_RANGE、t)により除算することにより、時刻tにおけるＭＤＣＴ係数の規格化値を求め、記憶部１２５に格納する。この時点で、図５（ａ）に示すように、送信機の記憶部１２５には、時刻tにおける中区画帯域内のＭＤＣＴ係数の最大値X_MAX(ω_RANGE、t)と、時刻tにおける最大値の差分値と、時刻tにおけるＭＤＣＴ係数の規格化値と、が格納されている。受信機の記憶部１２５には、時刻t-Δtにおける中区画帯域内のＭＤＣＴ係数の最大値X_MAX(ω_RANGE、t-Δt)が格納されたまま、変化はない。 The CPU 121 of the transmitter stores the maximum value X _MAX (ω _RANGE , t-Δt) of the MDCT coefficient in the medium partition band at time t-Δt stored in the storage unit 125 of the transmitter in the storage unit 125 as well. The difference value of the maximum value at time t is obtained by subtracting it from the maximum value X _MAX (ω _RANGE , t) of the MDCT coefficient in the medium partition band at time t, and stored in the storage unit 125. After this, the transmitter does not need the maximum value X _MAX (ω _RANGE , t−Δt) of the MDCT coefficient in the middle partition band at time t−Δt. Therefore, the storage capacity of the storage unit 125 of the transmitter may be deleted so as not to be compressed. The CPU 121 of the transmitter further uses the MDCT coefficient at time t stored in the storage unit 125 of the transmitter as the maximum value X _MAX ( By dividing by ω _RANGE , t), the normalized value of the MDCT coefficient at time t is obtained and stored in the storage unit 125. At this time, as shown in FIG. 5 (a), in the storage unit 125 of the transmitter, the maximum value X _MAX (ω _RANGE , t) of the MDCT coefficient in the medium partition band at time t and the maximum at time t are stored. The difference value between the values and the normalized value of the MDCT coefficient at time t are stored. In the storage unit 125 of the receiver, the maximum value X _MAX (ω _RANGE , t−Δt) of the MDCT coefficient in the middle partition band at the time t−Δt is stored, and there is no change.

送信機の記憶部１２５に格納された時刻tにおける最大値の差分値と、時刻tにおけるＭＤＣＴ係数の規格化値と、は、送信機のＣＰＵ１２１により、記憶部１２５から取り出されて量子化され、エントロピ符号化され、送信機のアンテナ１６３から無線送信される。かかるエントロピ符号化により生成された符号が重畳された無線信号は、受信機において、受信機のアンテナ１６３により捕捉される。この様子を、図５（ｂ）に模式的に示す。なお、代表的なエントロピ符号化方法としては、ハフマンコードや、RangeCoderが挙げられる。 The difference value of the maximum value at time t stored in the storage unit 125 of the transmitter and the normalized value of the MDCT coefficient at time t are extracted from the storage unit 125 and quantized by the CPU 121 of the transmitter, Entropy-encoded and wirelessly transmitted from the antenna 163 of the transmitter. The radio signal on which the code generated by the entropy encoding is superimposed is captured by the receiver antenna 163 in the receiver. This state is schematically shown in FIG. Note that representative entropy encoding methods include Huffman code and RangeCoder.

受信機のアンテナ１６３により捕捉された符号は、受信機のＣＰＵ１２１により復号される。復号の結果生じた、時刻tにおける最大値の差分値と、時刻tにおけるＭＤＣＴ係数の規格化値と、は、受信機の記憶部１２５に格納される。この時点で、図６（ａ）に示すように、受信機の記憶部１２５には、時刻t-Δtにおける中区画帯域内のＭＤＣＴ係数の最大値X_MAX(ω_RANGE、t-Δt)と、時刻tにおける最大値の差分値と、時刻tにおけるＭＤＣＴ係数の規格化値と、が格納されている。送信機の記憶部１２５には、時刻tにおける中区画帯域内のＭＤＣＴ係数の最大値X_MAX(ω_RANGE、t)が残されている。 The code captured by the receiver antenna 163 is decoded by the CPU 121 of the receiver. The difference value of the maximum value at time t and the normalized value of the MDCT coefficient at time t, which are generated as a result of decoding, are stored in the storage unit 125 of the receiver. At this time, as shown in FIG. 6A, the storage unit 125 of the receiver stores a maximum value X _MAX (ω _RANGE , t−Δt) of MDCT coefficients in the middle partition band at time t−Δt, The difference value of the maximum value at time t and the normalized value of the MDCT coefficient at time t are stored. In the storage unit 125 of the transmitter, the maximum value X _MAX (ω _RANGE , t) of the MDCT coefficient in the middle zone band at time t is left.

受信機のＣＰＵ１２１は、受信機の記憶部１２５に格納されている時刻t-Δtにおける中区画帯域内のＭＤＣＴ係数の最大値X_MAX(ω_RANGE、t-Δt)に、同じく記憶部１２５に格納されている時刻tにおける最大値の差分値を加算することにより、時刻tにおける中区画帯域内のＭＤＣＴ係数の最大値X_MAX(ω_RANGE、t)を求め、記憶部１２５に格納する。この後、時刻t-Δtにおける中区画帯域内のＭＤＣＴ係数の最大値X_MAX(ω_RANGE、t-Δt)と、時刻tにおける最大値の差分値と、は、不要であるので、これらの値は消去してもよい。受信機のＣＰＵ１２１は続いて、受信機の記憶部１２５に格納されている時刻tにおけるＭＤＣＴ係数の規格化値に、同じく記憶部１２５に格納されている時刻tにおける中区画帯域内のＭＤＣＴ係数の最大値X_MAX(ω_RANGE、t)を乗算することにより、時刻tにおけるＭＤＣＴ係数を求め、記憶部１２５に格納する。この時点で、図６（ｂ）に示すように、受信機の記憶部１２５には、時刻tにおける中区画帯域内のＭＤＣＴ係数の最大値X_MAX(ω_RANGE、t)と、時刻tにおけるＭＤＣＴ係数と、が格納されている。送信機の記憶部１２５には、時刻tにおける中区画帯域内のＭＤＣＴ係数の最大値X_MAX(ω_RANGE、t)が残されている。 Similarly, the CPU 121 of the receiver stores the maximum value X _MAX (ω _RANGE , t−Δt) of the MDCT coefficient in the middle zone band at time t−Δt stored in the storage unit 125 of the receiver in the storage unit 125. The maximum value X _MAX (ω _RANGE , t) of the MDCT coefficient in the medium partition band at time t is obtained by adding the difference value of the maximum value at time t and stored in the storage unit 125. Thereafter, the maximum value X _MAX (ω _RANGE , t-Δt) of the MDCT coefficient in the middle zone band at time t-Δt and the difference value between the maximum values at time t are unnecessary, so these values are not necessary. May be deleted. The CPU 121 of the receiver then continues to the MDCT coefficient standardized value at the time t stored in the storage unit 125 of the receiver, and the MDCT coefficient in the middle partition band at the time t stored in the storage unit 125. By multiplying the maximum value X _MAX (ω _RANGE , t), the MDCT coefficient at time t is obtained and stored in the storage unit 125. At this time, as shown in FIG. 6B, the storage unit 125 of the receiver stores the maximum MDCT coefficient X _MAX (ω _RANGE , t) in the medium partition band at time t and the MDCT at time t. The coefficient is stored. In the storage unit 125 of the transmitter, the maximum value X _MAX (ω _RANGE , t) of the MDCT coefficient in the middle zone band at time t is left.

このようにして、図４（ａ）に示したとおり初めは送信機の記憶部１２５に格納されていた時刻tにおけるＭＤＣＴ係数は、図６（ｂ）に示したとおり受信機の記憶部１２５に格納される。これはスペクトルについての情報が送信機から受信機に伝達されたことを意味する。この後、受信機は、周波数逆変換等により、送信機に入力された音声信号を復元することができる。 In this way, the MDCT coefficient at time t initially stored in the transmitter storage unit 125 as shown in FIG. 4A is stored in the storage unit 125 of the receiver as shown in FIG. 6B. Stored. This means that information about the spectrum has been transmitted from the transmitter to the receiver. Thereafter, the receiver can restore the audio signal input to the transmitter by frequency inverse transformation or the like.

なお、図４（ａ）において送信機と受信機のいずれの記憶部１２５にも時刻t-Δtにおける中区画帯域内のＭＤＣＴ係数の最大値X_MAX(ω_RANGE、t-Δt)が格納されていたことに対応して、図６（ｂ）では、送信機と受信機のいずれの記憶部１２５にも時刻tにおける中区画帯域内のＭＤＣＴ係数の最大値X_MAX(ω_RANGE、t)が格納されている。よって、時刻t+Δt以降は、図４〜図６に示した処理と同様の処理を繰り返すことにより、各時刻のＭＣＤＴ係数を送信機から受信機に伝達することが可能である。 In FIG. 4A, the maximum value X _MAX (ω _RANGE , t−Δt) of the MDCT coefficient in the middle partition band at time t−Δt is stored in both the storage units 125 of the transmitter and the receiver. Correspondingly, in FIG. 6B, the maximum value X _MAX (ω _RANGE , t) of the MDCT coefficient in the middle zone band at time t is stored in both the storage units 125 of the transmitter and the receiver. Has been. Therefore, after time t + Δt, it is possible to transmit the MCDT coefficient at each time from the transmitter to the receiver by repeating the same processing as that shown in FIGS.

ＭＤＣＴ係数自体は、様々な値をとる。それに対して、音声信号の時間的連続性ゆえに、上述の最大値の差分値としては、比較的小さな値が高い頻度で出現する。この傾向は、音声信号が定常状態にある時間帯にはいっそう顕著である。このように偏りのある情報は、エントロピ符号化による圧縮効率が高い。よって、本実施形態によれば、単にＭＤＣＴ係数自体を符号化する場合に比べて、伝達レートの割に高品質の音声を復元することができる。 The MDCT coefficient itself takes various values. On the other hand, because of the temporal continuity of the audio signal, a relatively small value appears with high frequency as the above-described maximum difference value. This tendency is more prominent in the time zone when the audio signal is in a steady state. Such biased information has high compression efficiency by entropy coding. Therefore, according to the present embodiment, it is possible to restore high-quality speech for the transmission rate as compared with the case where the MDCT coefficient itself is simply encoded.

なお、理解を容易にするために、図４〜図６を用いた上述の説明においては、ＭＤＣＴ係数を求める時間間隔と符号化処理の時間間隔とを共にΔtで表したが、かかる２種の時間間隔が等しい必要はない。例えば、音声通話におけるリアルタイム感を損なわない限り、いくつかの連続した時間帯の音声信号から算出される複数組のＭＤＣＴ係数を送信機の記憶部１２５に貯めておいてから、一括して量子化し、エントロピ符号化してもよい。 In order to facilitate understanding, in the above description using FIGS. 4 to 6, both the time interval for obtaining the MDCT coefficient and the time interval of the encoding process are represented by Δt. The time intervals need not be equal. For example, as long as the real-time feeling in a voice call is not impaired, a plurality of sets of MDCT coefficients calculated from voice signals in several continuous time zones are stored in the storage unit 125 of the transmitter and then quantized in a batch. Entropy encoding may be performed.

以下では、上述の処理の流れを、フローチャートを参照しつつ説明する。図７は、時刻tに送信機において行われる、中区画帯域最大値検索、中区画帯域差分の計算、及び、規格化ＭＤＣＴ係数の計算の流れを示すフローチャートである。なお、デジタル音声信号は既にＭＤＣＴを施されており、送信機の記憶部１２５にはＭＤＣＴ係数が格納されているものとする。 Hereinafter, the above-described processing flow will be described with reference to flowcharts. FIG. 7 is a flowchart showing the flow of middle partition bandwidth maximum value search, middle partition bandwidth difference computation, and normalized MDCT coefficient computation performed at the transmitter at time t. It is assumed that the digital audio signal has already been subjected to MDCT, and the MDCT coefficient is stored in the storage unit 125 of the transmitter.

送信機のＣＰＵ１２１は、帯域識別変数ω_RANGEを1に初期化し（ステップＳ７１１）、記憶部１２５からＭＤＣＴ係数X(ω_RANGE、1、t)、・・・、X(ω_RANGE、q(ω_RANGE)、t)をロードし（ステップＳ７１３）、ロードしたこれらのＭＤＣＴ係数のうちの最大値である中区画帯域最大値X_MAX(ω_RANGE、t)を求め（ステップＳ７１５）、X_MAX(ω_RANGE、t)を記憶部１２５に格納する（ステップＳ７１７）。 The CPU 121 of the transmitter initializes the band identification variable ω _RANGE to 1 (step S711), and stores MDCT coefficients X (ω _RANGE , 1, t),..., X (ω _RANGE , q (ω _RANGE ) from the storage unit 125. ), T) are loaded (step S713), and the maximum value X _MAX (ω _RANGE , t), which is the maximum value of these loaded MDCT coefficients, is obtained (step S715), and X _MAX (ω _RANGE , T) are stored in the storage unit 125 (step S717).

ステップＳ７１７でX_MAX(ω_RANGE、t)を記憶部１２５に格納するのは、次の時刻である時刻t+Δtにおける処理に必要となるからである。 The reason why X _MAX (ω _RANGE , t) is stored in the storage unit 125 in step S717 is that it is necessary for processing at time t + Δt, which is the next time.

ＣＰＵ１２１は、直前時刻の中区画帯域最大値X_MAX(ω_RANGE、t-Δt)を記憶部１２５からロードする（ステップＳ７１９）。 The CPU 121 loads the medium partition band maximum value X _MAX (ω _RANGE , t−Δt) immediately before from the storage unit 125 (step S719).

ステップＳ７１９でＣＰＵ１２１がX_MAX(ω_RANGE、t-Δt)を記憶部１２５からロードすることができるのは、直前時刻におけるステップＳ７１７に相当するステップでX_MAX(ω_RANGE、t-Δt)が記憶部１２５に格納されたからである。 In step S719, the CPU 121 can load X _MAX (ω _RANGE , t−Δt) from the storage unit 125 in a step corresponding to step S717 at the immediately preceding time, in which X _MAX (ω _RANGE , t−Δt) is stored. This is because it is stored in the unit 125.

ＣＰＵ１２１は、中区画帯域差分ΔX_MAX(ω_RANGE、t)を、ΔX_MAX(ω_RANGE、t)=X_MAX(ω_RANGE、t)-X_MAX(ω_RANGE、t-Δt)により計算し（ステップＳ７２１）、記憶部１２５に格納する（ステップＳ７２３）。格納されたΔX_MAX(ω_RANGE、t)は、符号化の対象となる。ＣＰＵ１２１は続いて、規格化ＭＤＣＴ係数X_REG(ω_RANGE、1、t)、・・・、X_REG(ω_RANGE、q(ω_RANGE)、t)を、X_REG(ω_RANGE、1、t)=X(ω_RANGE、1、t)/X_MAX(ω_RANGE、t)、・・・、X_REG(ω_RANGE、q(ω_RANGE)、t)=X(ω_RANGE、q(ω_RANGE)、t)/X_MAX(ω_RANGE、t)のように計算し（ステップＳ７２５）、記憶部１２５に格納する（ステップＳ７２７）。格納されたX_REG(ω_RANGE、1、t)、・・・、X_REG(ω_RANGE、q(ω_RANGE)、t)は、符号化の対象となる。ＣＰＵ１２１はさらに、全ての中区画帯域についての処理が終わったか否かを判別し（ステップＳ７２９）、終わったと判別された場合（ステップＳ７２９；Ｙｅｓ）は処理を終了し、終わっていないと判別された場合（ステップＳ７２９；Ｎｏ）は次の帯域について処理するためにω_RANGEを1増加してから（ステップＳ７３１）、ステップＳ７１３に戻る。 The CPU 121 calculates the middle partition band difference ΔX _MAX (ω _RANGE , t) by ΔX _MAX (ω _RANGE , t) = X _MAX (ω _RANGE , t) −X _MAX (ω _RANGE , t−Δt) (step S721), and stored in the storage unit 125 (step S723). The stored ΔX _MAX (ω _RANGE , t) is to be encoded. Subsequently, the CPU 121 calculates the normalized MDCT coefficients X _REG (ω _RANGE , 1, t),..., X _REG (ω _RANGE , q (ω _RANGE ), t), and X _REG (ω _RANGE , 1, t). = X (ω _RANGE , 1, t) / X _MAX (ω _RANGE , t), ..., X _REG (ω _RANGE , q (ω _RANGE ), t) = X (ω _RANGE , q (ω _RANGE ), t) / X _MAX (ω _RANGE , t) (step S725) and stored in the storage unit 125 (step S727). The stored X _REG (ω _RANGE , 1, t),..., X _REG (ω _RANGE , q (ω _RANGE ), t) are to be encoded. The CPU 121 further determines whether or not the processing has been completed for all the medium-compartment bandwidths (step S729). If it is determined that the processing has ended (step S729; Yes), the processing is ended and it is determined that the processing has not ended. In the case (step S729; No), ω _RANGE is incremented by 1 to process the next band (step S731), and the process returns to step S713.

時刻tに送信機において行われる上述の図７のフローチャートに示す処理に対応した、時刻tに受信機において行われる処理の流れを、図８に示すフローチャートを参照しつつ説明する。受信機は、中区画帯域最大値及びＭＤＣＴ係数を計算する。なお、送信機によりエントロピ符号化されてから受信機に伝達された中区画帯域差分ΔX_MAX(ω_RANGE、t)及び規格化ＭＤＣＴ係数X_REG(ω_RANGE、1、t)、・・・、X_REG(ω_RANGE、q(ω_RANGE)、t)は、既に復号されて、受信機の記憶部１２５に格納されているものとする。 The flow of the process performed at the receiver at time t corresponding to the process illustrated in the flowchart of FIG. 7 described above performed at the transmitter at time t will be described with reference to the flowchart illustrated in FIG. The receiver calculates the maximum value of the medium zone band and the MDCT coefficient. It should be noted that the medium zone band difference ΔX _MAX (ω _RANGE , t) and the normalized MDCT coefficient X _REG (ω _RANGE , 1, t),..., X that are entropy encoded by the transmitter and then transmitted to the receiver _It is assumed that _REG (ω _RANGE , q (ω _RANGE ), t) has already been decoded and stored in the storage unit 125 of the receiver.

受信機のＣＰＵ１２１は、帯域識別変数ω_RANGEを1に初期化し（ステップＳ７４１）、記憶部１２５から直前時刻の中区画帯域最大値X_MAX(ω_RANGE、t-Δt)をロードし（ステップＳ７４３）、中区画帯域差分ΔX_MAX(ω_RANGE、t)をロードし（ステップＳ７４５）、中区画帯域最大値X_MAX(ω_RANGE、t)をX_MAX(ω_RANGE、t)= X_MAX(ω_RANGE、t-Δt)+ΔX_MAX(ω_RANGE、t)により求め（ステップＳ７４７）、X_MAX(ω_RANGE、t)を記憶部１２５に格納する（ステップＳ７４９）。 The CPU 121 of the receiver initializes the band identification variable ω _RANGE to 1 (step S741), and loads the medium zone maximum band value X _MAX (ω _RANGE , t−Δt) immediately before from the storage unit 125 (step S743). , The medium partition band difference ΔX _MAX (ω _RANGE , t) is loaded (step S745), and the medium partition band maximum value X _MAX (ω _RANGE , t) is set to X _MAX (ω _RANGE , t) = X _MAX (ω _RANGE , t−Δt) + ΔX _MAX (ω _RANGE , t) is obtained (step S747), and X _MAX (ω _RANGE , t) is stored in the storage unit 125 (step S749).

ステップＳ７４９でX_MAX(ω_RANGE、t)を記憶部１２５に格納するのは、次の時刻である時刻t+Δtにおける処理に必要となるからである。また、前の時刻であるt-Δtにおいて、このステップＳ７４９に相当する処理が行われていたからこそ、時刻tにおける上述のステップＳ７４３において、ＣＰＵ１２１が記憶部１２５からX_MAX(ω_RANGE、t-Δt)をロードすることができたのである。 The reason why X _MAX (ω _RANGE , t) is stored in the storage unit 125 in step S749 is that it is necessary for processing at time t + Δt, which is the next time. In addition, because the processing corresponding to step S749 was performed at t-Δt, which is the previous time, the CPU 121 stores X _MAX (ω _RANGE , t-Δt) from the storage unit 125 in the above-described step S743 at time t. Could be loaded.

ＣＰＵ１２１は、規格化ＭＤＣＴ係数X_REG(ω_RANGE、1、t)、・・・、X_REG(ω_RANGE、q(ω_RANGE)、t)をロードし（ステップＳ７５１）、ＭＤＣＴ係数X(ω_RANGE、1、t)、・・・、X(ω_RANGE、q(ω_RANGE)、t)を、X(ω_RANGE、1、t)=X_REG(ω_RANGE、1、t)×X_MAX(ω_RANGE、t)、・・・、X(ω_RANGE、q(ω_RANGE)、t)=X_REG(ω_RANGE、q(ω_RANGE)、t)×X_MAX(ω_RANGE、t)のように計算し（ステップＳ７５３）、X(ω_RANGE、1、t)、・・・、X(ω_RANGE、q(ω_RANGE)、t)を記憶部１２５に格納する（ステップＳ７５５）。これらのＭＤＣＴ係数に対して、実時間領域への変換等、よく知られた処理が施されることにより、音声信号が復元される。ＣＰＵ１２１はさらに、全ての中区画帯域についての処理が終わったか否かを判別し（ステップＳ７５７）、終わったと判別された場合（ステップＳ７５７；Ｙｅｓ）は処理を終了し、終わっていないと判別された場合（ステップＳ７５７；Ｎｏ）は次の帯域について処理するためにω_RANGEを1増加してから（ステップＳ７５９）、ステップＳ７４３に戻る。 The CPU 121 loads standardized MDCT coefficients X _REG (ω _RANGE , 1, t),..., X _REG (ω _RANGE , q (ω _RANGE ), t) (step S751), and MDCT coefficient X (ω _RANGE , 1, t), ..., X (ω _RANGE , q (ω _RANGE ), t), X (ω _RANGE , 1, t) = X _REG (ω _RANGE , 1, t) × X _MAX (ω _RANGE , t), ..., X (ω _RANGE , q (ω _RANGE ), t) = X _REG (ω _RANGE , q (ω _RANGE ), t) × X _MAX (ω _RANGE , t) (Step S753), X (ω _RANGE , 1, t),..., X (ω _RANGE , q (ω _RANGE ), t) are stored in the storage unit 125 (Step S755). These MDCT coefficients are subjected to well-known processing such as conversion to a real-time domain, thereby restoring the audio signal. Further, the CPU 121 determines whether or not the processing has been completed for all the medium partition bands (step S757). If it is determined that the processing has ended (step S757; Yes), the processing is ended, and it is determined that the processing has not ended. In the case (step S757; No), ω _RANGE is incremented by 1 to process the next band (step S759), and the process returns to step S743.

（実施形態１の変形例）
以下では、本発明の実施形態１の変形例に係る音声符号化兼復号装置について述べる。装置の概要は、実施形態１に係る音声符号化兼復号装置１１１と同様である。 (Modification of Embodiment 1)
Hereinafter, a speech encoding / decoding device according to a modification of the first embodiment of the present invention will be described. The outline of the apparatus is the same as that of the speech encoding / decoding apparatus 111 according to the first embodiment.

中区画帯域最大値の変化を表す量として、実施形態１においては、差分を用いていた。これに対して、本変形例においては、比率を用いる。両者における処理の内容は、ほぼ同じである。 In the first embodiment, the difference is used as the amount representing the change in the maximum value of the middle zone band. On the other hand, the ratio is used in this modification. The contents of the processing in both are almost the same.

送信機が行う処理は、既に説明した図７のフローチャートの一部を変更した処理である。すなわち、図７のステップＳ７２１において、中区画帯域比率RaX_MAX(ω_RANGE、t)をRaX_MAX(ω_RANGE、t)=X_MAX(ω_RANGE、t)/X_MAX(ω_RANGE、t-Δt)により計算するよう変更する。また、ステップＳ７２３において、RaX_MAX(ω_RANGE、t)を記憶部１２５に格納するよう変更する。 The process performed by the transmitter is a process obtained by changing a part of the flowchart shown in FIG. That is, in step S721 in FIG. 7, the medium partition band ratio RaX _MAX (ω _RANGE , t) is set to RaX _MAX (ω _RANGE , t) = X _MAX (ω _RANGE , t) / X _MAX (ω _RANGE , t−Δt). Change to calculate by In step S723, the storage unit 125 is changed to store RaX _MAX (ω _RANGE , t).

受信機が行う処理は、既に説明した図８のフローチャートの一部を変更した処理である。すなわち、図８のステップＳ７４５において、中区画帯域比率RaX_MAX(ω_RANGE、t)をロードするよう変更する。また、ステップＳ７４７において、中区画帯域最大値X_MAX(ω_RANGE、t)をX_MAX(ω_RANGE、t)= X_MAX(ω_RANGE、t-Δt)×RaX_MAX(ω_RANGE、t)により求めるよう変更する。 The process performed by the receiver is a process obtained by changing a part of the flowchart shown in FIG. That is, in step S745 in FIG. 8, the medium partition bandwidth ratio RaX _MAX (ω _RANGE , t) is changed to be loaded. In step S747, the maximum value X _MAX (ω _RANGE , t) of the medium partition band is obtained by X _MAX (ω _RANGE , t) = X _MAX (ω _RANGE , t−Δt) × RaX _MAX (ω _RANGE , t). Change as follows.

中区画帯域比率RaX_MAX(ω_RANGE、t)として出現する値は1の近傍に偏るので、高い効率で符号化することができる。 Since the value appearing as the middle zone bandwidth ratio RaX _MAX (ω _RANGE , t) is biased to the vicinity of 1, it can be encoded with high efficiency.

（実施形態２）
以下では、本発明の実施形態２に係る音声符号化兼復号装置について述べる。装置の概要は、実施形態１に係る音声符号化兼復号装置１１１と同様である。また、送信機と受信機とが行う演算及び両者間で授受される情報の概要は、図４〜図６を参照しつつ説明した実施形態１の場合と、ほぼ同様である。 (Embodiment 2)
Hereinafter, a speech encoding / decoding device according to Embodiment 2 of the present invention will be described. The outline of the apparatus is the same as that of the speech encoding / decoding apparatus 111 according to the first embodiment. In addition, the operations performed by the transmitter and the receiver and the outline of information exchanged between them are substantially the same as those in the first embodiment described with reference to FIGS.

実施形態１やその変形例においては、全ての中区画帯域についての差分値や比率が送受信機間で授受された。それに対して、本実施形態においては、中区画帯域の差分値や比率のうち最大の値だけが、送受信機間で授受される。以下では、かかる処理の流れを、図９〜図１１に示すフローチャートを参照しつつ説明する。 In the first embodiment and its modifications, the difference values and ratios for all the middle zone bands are exchanged between the transceivers. On the other hand, in the present embodiment, only the maximum value among the difference values and ratios of the middle zone band is exchanged between the transceivers. Hereinafter, the flow of such processing will be described with reference to the flowcharts shown in FIGS.

図９は、時刻tに送信機において行われる、中区画帯域最大値検索、中区画帯域差分の計算、及び、最大差分の計算の流れを示すフローチャートである。なお、デジタル音声信号は既にＭＤＣＴを施されており、送信機の記憶部１２５にはＭＤＣＴ係数が格納されているものとする。 FIG. 9 is a flowchart showing a flow of middle partition bandwidth maximum value search, middle partition bandwidth difference calculation, and maximum difference computation performed at the transmitter at time t. It is assumed that the digital audio signal has already been subjected to MDCT, and the MDCT coefficient is stored in the storage unit 125 of the transmitter.

送信機のＣＰＵ１２１は、最大差分MaxΔX_MAX(t)を0に初期化し（ステップＳ７７１）、帯域識別変数ω_RANGEを1に初期化し（ステップＳ７７３）、記憶部１２５からＭＤＣＴ係数X(ω_RANGE、1、t)、・・・、X(ω_RANGE、q(ω_RANGE)、t)をロードし（ステップＳ７７５）、ロードしたこれらのＭＤＣＴ係数のうちの最大値である中区画帯域最大値X_MAX(ω_RANGE、t)を求め（ステップＳ７７７）、X_MAX(ω_RANGE、t)を記憶部１２５に格納する（ステップＳ７７９）。ステップＳ７７９における格納は、次の時刻である時刻t+Δtにおける処理に役立てるためである。 The CPU 121 of the transmitter initializes the maximum difference MaxΔX _MAX (t) to 0 (step S771), initializes the band identification variable ω _RANGE to 1 (step S773), and stores the MDCT coefficient X (ω _RANGE , 1 from the storage unit 125. , T),..., X (ω _RANGE , q (ω _RANGE ), t) are loaded (step S775), and the medium zone band maximum value X _MAX (the maximum value among these loaded MDCT coefficients) ω _RANGE , t) is obtained (step S777), and X _MAX (ω _RANGE , t) is stored in the storage unit 125 (step S779). The storage in step S779 is for use in processing at time t + Δt, which is the next time.

ＣＰＵ１２１は、直前時刻の中区画帯域最大値X_MAX(ω_RANGE、t-Δt)を記憶部１２５からロードする（ステップＳ７８１）。かかるロードが可能なのは、直前時刻においてステップＳ７７９に相当するステップが実行されたからからである。 The CPU 121 loads the medium partition band maximum value X _MAX (ω _RANGE , t−Δt) immediately before from the storage unit 125 (step S781). Such loading is possible because a step corresponding to step S779 has been executed at the immediately preceding time.

ＣＰＵ１２１は、中区画帯域差分ΔX_MAX(ω_RANGE、t)を、ΔX_MAX(ω_RANGE、t)=X_MAX(ω_RANGE、t)-X_MAX(ω_RANGE、t-Δt)により計算し（ステップＳ７８３）、ΔX_MAX(ω_RANGE、t) がMaxΔX_MAX(t)以上であるか否かを判別する（ステップＳ７８５）。ΔX_MAX(ω_RANGE、t) がMaxΔX_MAX(t)以上であると判別された場合（ステップＳ７８５；Ｙｅｓ）は、MaxΔX_MAX(t)をMaxΔX_MAX(t)=ΔX_MAX(ω_RANGE、t)のように更新してから（ステップＳ７８７）、ステップＳ７８９に進む。ΔX_MAX(ω_RANGE、t)がMaxΔX_MAX(t)以上ではないと判別された場合（ステップＳ７８５；Ｎｏ）は、すぐにステップＳ７８９に進む。ステップＳ７８９では、ＣＰＵ１２１は、全ての中区画帯域についての処理が終わったか否かを判別し、終わったと判別された場合（ステップＳ７８９；Ｙｅｓ）はステップＳ７９３に進み、終わっていないと判別された場合（ステップＳ７８９；Ｎｏ）は次の帯域について処理するためにω_RANGEを1増加してから（ステップＳ７９１）、ステップＳ７７５に戻る。ステップＳ７９３では、ＣＰＵ１２１は、MaxΔX_MAX(t)を記憶部１２５に格納し、その後、処理を終了する。ステップＳ７９３で格納されたMaxΔX_MAX(t)は、符号化の対象となる。 The CPU 121 calculates the middle partition band difference ΔX _MAX (ω _RANGE , t) by ΔX _MAX (ω _RANGE , t) = X _MAX (ω _RANGE , t) −X _MAX (ω _RANGE , t−Δt) (step S783), it is determined whether or not ΔX _MAX (ω _RANGE , t) is equal to or greater than MaxΔX _MAX (t) (step S785). When it is determined that ΔX _MAX (ω _RANGE , t) is equal to or greater than MaxΔX _MAX (t) (step S785; Yes), MaxΔX _MAX (t) is set to MaxΔX _MAX (t) = ΔX _MAX (ω _RANGE , t) (Step S787), the process proceeds to step S789. If it is determined that ΔX _MAX (ω _RANGE , t) is not equal to or greater than MaxΔX _MAX (t) (step S785; No), the process immediately proceeds to step S789. In step S789, the CPU 121 determines whether or not the processing has been completed for all of the medium partition bands. If it is determined that the processing has ended (step S789; Yes), the process proceeds to step S793, and if it is determined that the processing has not ended. In (Step S789; No), ω _RANGE is incremented by 1 to process the next band (Step S791), and the process returns to Step S775. In step S793, the CPU 121 stores MaxΔX _MAX (t) in the storage unit 125, and thereafter ends the process. The MaxΔX _MAX (t) stored in step S793 is to be encoded.

送信機のＣＰＵ１２１は、図９のフローチャートに示す処理を終了した後、図１０のフローチャートに示す処理により最大差分使用時の規格化ＭＤＣＴ係数の計算を行う。 The CPU 121 of the transmitter, after finishing the process shown in the flowchart of FIG. 9, calculates the normalized MDCT coefficient when using the maximum difference by the process shown in the flowchart of FIG.

送信機のＣＰＵ１２１は、記憶部１２５から最大差分MaxΔX_MAX(t)をロードし（ステップＳ８１１）、帯域識別変数ω_RANGEを1に初期化し（ステップＳ８１３）、記憶部１２５からＭＤＣＴ係数X(ω_RANGE、1、t)、・・・、X(ω_RANGE、q(ω_RANGE)、t)をロードし（ステップＳ８１５）、直前時刻の中区画帯域最大値X_MAX(ω_RANGE、t-Δt)をロードする（ステップＳ８１７）。かかるロードが可能なのは、直前時刻において図９のステップＳ７７９に相当するステップが実行されたからからである。ＣＰＵ１２１は続いて、規格化ＭＤＣＴ係数X_REG(ω_RANGE、1、t)、・・・、X_REG(ω_RANGE、q(ω_RANGE)、t)を、X_REG(ω_RANGE、1、t)=X(ω_RANGE、1、t)/{X_MAX(ω_RANGE、t-Δt)+MaxΔX_MAX(t)}、・・・、X_REG(ω_RANGE、q(ω_RANGE)、t)=X(ω_RANGE、q(ω_RANGE)、t)/{X_MAX(ω_RANGE、t-Δt)+MaxΔX_MAX(t)}のように計算し（ステップＳ８１９）、記憶部１２５に格納する（ステップＳ８２１）。格納されたX_REG(ω_RANGE、1、t)、・・・、X_REG(ω_RANGE、q(ω_RANGE)、t)は、符号化の対象となる。ＣＰＵ１２１はさらに、全ての中区画帯域についての処理が終わったか否かを判別し（ステップＳ８２３）、終わったと判別された場合（ステップＳ８２３；Ｙｅｓ）は処理を終了し、終わっていないと判別された場合（ステップＳ８２３；Ｎｏ）は次の帯域について処理するためにω_RANGEを1増加してから（ステップＳ８２５）、ステップＳ８１５に戻る。 The CPU 121 of the transmitter loads the maximum difference MaxΔX _MAX (t) from the storage unit 125 (step S811), initializes the band identification variable ω _RANGE to 1 (step S813), and stores the MDCT coefficient X (ω _RANGE from the storage unit 125. , 1, t),..., X (ω _RANGE , q (ω _RANGE ), t) are loaded (step S 815), and the medium partition band maximum value X _MAX (ω _RANGE , t−Δt) immediately before is loaded. Load (step S817). Such loading is possible because a step corresponding to step S779 in FIG. 9 has been executed at the immediately preceding time. Subsequently, the CPU 121 calculates the normalized MDCT coefficients X _REG (ω _RANGE , 1, t),..., X _REG (ω _RANGE , q (ω _RANGE ), t), and X _REG (ω _RANGE , 1, t). = X (ω _RANGE , 1, t) / {X _MAX (ω _RANGE , t-Δt) + MaxΔX _MAX (t)}, ..., X _REG (ω _RANGE , q (ω _RANGE ), t) = X (ω _RANGE , q (ω _RANGE ), t) / {X _MAX (ω _RANGE , t−Δt) + MaxΔX _MAX (t)} (step S 819) and stored in the storage unit 125 (step S 821 ). The stored X _REG (ω _RANGE , 1, t),..., X _REG (ω _RANGE , q (ω _RANGE ), t) are to be encoded. The CPU 121 further determines whether or not the processing has been completed for all the medium-compartment bandwidths (step S823). If it is determined that the processing has ended (step S823; Yes), the processing ends and it is determined that the processing has not ended. In the case (step S823; No), ω _RANGE is increased by 1 to process the next band (step S825), and the process returns to step S815.

時刻tに送信機において行われる上述の図９及び図１０のフローチャートに示す処理に対応した、時刻tに受信機において行われる処理の流れを、図１１に示すフローチャートを参照しつつ説明する。本実施形態においては、受信機は、送信機から伝達された最大差分MaxΔX_MAX(t)に基づいて、中区画帯域最大値及びＭＤＣＴ係数を計算する。なお、送信機によりエントロピ符号化されてから受信機に伝達された最大差分MaxΔX_MAX(t)及び規格化ＭＤＣＴ係数X_REG(ω_RANGE、1、t)、・・・、X_REG(ω_RANGE、q(ω_RANGE)、t)は、既に復号されて、受信機の記憶部１２５に格納されているものとする。 The flow of processing performed at the receiver at time t corresponding to the processing illustrated in the flowcharts of FIGS. 9 and 10 performed at the transmitter at time t will be described with reference to the flowchart illustrated in FIG. In the present embodiment, the receiver calculates the mid-zone band maximum value and the MDCT coefficient based on the maximum difference MaxΔX _MAX (t) transmitted from the transmitter. Note that the maximum difference MaxΔX _MAX (t) and the normalized MDCT coefficients X _REG (ω _RANGE , 1, t), ..., X _REG (ω _RANGE , It is assumed that q (ω _RANGE ), t) has already been decoded and stored in the storage unit 125 of the receiver.

受信機のＣＰＵ１２１は、記憶部１２５から最大差分MaxΔX_MAX(t)をロードし（ステップＳ８３１）、帯域識別変数ω_RANGEを1に初期化し（ステップＳ８３３）、記憶部１２５から直前時刻の中区画帯域最大値X_MAX(ω_RANGE、t-Δt)をロードし（ステップＳ８３５）、記憶部１２５から規格化ＭＤＣＴ係数X_REG(ω_RANGE、1、t)、・・・、X_REG(ω_RANGE、q(ω_RANGE)、t)をロードし（ステップＳ８３７）、ＭＤＣＴ係数X(ω_RANGE、1、t)、・・・、X(ω_RANGE、q(ω_RANGE)、t)を、X(ω_RANGE、1、t)=X_REG(ω_RANGE、1、t)×{X_MAX(ω_RANGE、t-Δt)+MaxΔX_MAX(t)}、・・・、X(ω_RANGE、q(ω_RANGE)、t)=X_REG(ω_RANGE、q(ω_RANGE)、t)×{X_MAX(ω_RANGE、t-Δt)+MaxΔX_MAX(t)}のように計算し（ステップＳ８３９）、記憶部１２５に格納する（ステップＳ８４１）。これらのＭＤＣＴ係数に対して、実時間領域への変換等、よく知られた処理が施されることにより、音声信号が復元される。 The CPU 121 of the receiver loads the maximum difference MaxΔX _MAX (t) from the storage unit 125 (step S 831), initializes the band identification variable ω _RANGE to 1 (step S 833), and stores the medium partition band at the previous time from the storage unit 125. The maximum value X _MAX (ω _RANGE , t−Δt) is loaded (step S 835), and the normalized MDCT coefficient X _REG (ω _RANGE , 1, t),..., X _REG (ω _RANGE , q (ω _RANGE), t) to load the (step S837), MDCT coefficient _{X (ω RANGE, 1, t} ), ···, X (ω RANGE, q (ω RANGE) a, t), X (ω _RANGE , 1, t) = X _REG (ω _RANGE , 1, t) × {X _MAX (ω _RANGE , t-Δt) + MaxΔX _MAX (t)}, ..., X (ω _RANGE , q (ω _RANGE ) , T) = X _REG (ω _RANGE , q (ω _RANGE ), t) × {X _MAX (ω _RANGE , t−Δt) + MaxΔX _MAX (t)} (step S 839), and the storage unit 125. (Step S841). These MDCT coefficients are subjected to well-known processing such as conversion to a real-time domain, thereby restoring the audio signal.

ＣＰＵ１２１は続いて、ステップＳ８３９で求めたX(ω_RANGE、1、t)、・・・、X(ω_RANGE、q(ω_RANGE)、t)のうちの最大値である中区画帯域最大値X_MAX(ω_RANGE、t)を求め（ステップＳ８４３）、記憶部１２５に格納する（ステップＳ８４５）。ステップＳ８４５における格納は、次の時刻であるt+Δtにおける処理に役立てるためである。なお、ステップＳ８３５においてX_MAX(ω_RANGE、t-Δt)がロード可能であるのは、前の時刻であるt-ΔtにおいてステップＳ８４５に相当するステップが実行されたからである。ＣＰＵ１２１はさらに、全ての中区画帯域についての処理が終わったか否かを判別し（ステップＳ８４７）、終わったと判別された場合（ステップＳ８４７；Ｙｅｓ）は処理を終了し、終わっていないと判別された場合（ステップＳ８４７；Ｎｏ）は次の帯域について処理するためにω_RANGEを1増加してから（ステップＳ８４９）、ステップＳ８３５に戻る。 Subsequently, the CPU 121 continuously determines the maximum value X of the medium partition band X which is the maximum value of X (ω _RANGE , 1, t),..., X (ω _RANGE , q (ω _RANGE ), t) obtained in step S839. _MAX (ω _RANGE , t) is obtained (step S843) and stored in the storage unit 125 (step S845). This is because the storage in step S845 is useful for processing at t + Δt, which is the next time. Note that X _MAX (ω _RANGE , t−Δt) can be loaded in step S835 because the step corresponding to step S845 was executed at t−Δt which is the previous time. The CPU 121 further determines whether or not the processing has been completed for all of the medium partition bands (step S847). If it is determined that the processing has ended (step S847; Yes), the processing is ended and it is determined that the processing has not ended. In the case (step S847; No), ω _RANGE is incremented by 1 to process the next band (step S849), and the process returns to step S835.

図７のステップＳ７２５と図１０のステップ８１９とを比較すれば、次のことが明らかである。すなわち、実施形態１においては、規格化ＭＤＣＴ係数を求めるための除数が、中区画帯域毎のＭＤＣＴ係数の最大値であるために、規格化ＭＤＣＴ係数は最も精度良く求まる。一方、本実施形態においては、規格化ＭＤＣＴ係数を求めるための除数として、最大差分に基づく値を採用しているため、実施形態１に比べ求めた規格化ＭＤＣＴ係数の精度が低く、したがって受信機によるＭＤＣＴ係数の復元の精度も実施形態１に比べて低い。換言すると、図９に示された最大差分MaxΔX_MAX(t)の定義から明らかなように、実施形態１における規格化のための除数X_MAX(ω_RANGE、t)と、本実施形態における規格化のための除数X_MAX(ω_RANGE、t-Δt)+MaxΔX_MAX(t)と、の間には、X_MAX(ω_RANGE、t)≦X_MAX(ω_RANGE、t-Δt)+MaxΔX_MAX(t)という関係が成立する。つまり本実施形態においては、ＭＤＣＴ係数を必要以上に大きい値で除してしまう場合が多くなると考えられる。かかる場合には、結果として、規格化ＭＤＣＴ係数が全体的に必要以上に小さい値となる。ところで、規格化ＭＤＣＴ係数を表すためのビット数は、規格化という操作の性質ゆえに、規格化ＭＤＣＴ係数が0以上1以下の値をとることを前提にして、あらかじめ決められていることが妥当である。よって、上述のように規格化ＭＤＣＴ係数が必要以上に小さい値になった場合には、1に近い数を表すように準備されていたビットが無駄になるとともに、ビット単位で量子化する際の誤差が大きくなる。この意味で、本実施形態は、実施形態１に比べて、精度の低い音声符号化及び復号がなされるといえる。 If step S725 in FIG. 7 is compared with step 819 in FIG. 10, the following is clear. That is, in the first embodiment, the divisor for obtaining the normalized MDCT coefficient is the maximum value of the MDCT coefficient for each medium partition band, so that the normalized MDCT coefficient can be obtained with the highest accuracy. On the other hand, in the present embodiment, since the value based on the maximum difference is adopted as the divisor for obtaining the normalized MDCT coefficient, the accuracy of the normalized MDCT coefficient obtained in comparison with the first embodiment is low. The accuracy of the MDCT coefficient restoration by the method is lower than that of the first embodiment. In other words, as apparent from the definition of the maximum difference MaxΔX _MAX (t) shown in FIG. 9, the divisor X _MAX (ω _RANGE , t) for normalization in the first embodiment and the normalization in the present embodiment. The divisor for X _MAX (ω _RANGE , t-Δt) + MaxΔX _MAX (t) and X _MAX (ω _RANGE , t) ≤ X _MAX (ω _RANGE , t-Δt) + MaxΔX _MAX ( The relationship t) holds. That is, in this embodiment, it is considered that there are many cases where the MDCT coefficient is divided by a value larger than necessary. In such a case, as a result, the normalized MDCT coefficient becomes a smaller value than necessary as a whole. By the way, it is reasonable that the number of bits for representing the normalized MDCT coefficient is determined in advance on the assumption that the normalized MDCT coefficient takes a value of 0 or more and 1 or less because of the nature of the operation of normalization. is there. Therefore, when the normalized MDCT coefficient becomes a value smaller than necessary as described above, the bit prepared to represent a number close to 1 is wasted and at the time of quantization in bit units. The error increases. In this sense, it can be said that the present embodiment performs voice encoding and decoding with lower accuracy than the first embodiment.

しかし、実施形態１の場合は全ての中区画帯域における差分を送受信機間で授受しなければならなかったのに対して、本実施形態の場合は、全ての中区画帯域における差分のうちの最大値のみを授受すればよい。よって、本実施形態によれば、実施形態１の場合に比べて、符号化の対象となるデータの量を減少させることができ、低ビットレート通信に資する。 However, in the case of the first embodiment, the difference in all the medium partition bands had to be exchanged between the transmitter and the receiver, whereas in the case of this embodiment, the maximum of the differences in all the medium partition bands. Only the value needs to be exchanged. Therefore, according to the present embodiment, compared to the first embodiment, the amount of data to be encoded can be reduced, which contributes to low bit rate communication.

（実施形態２の変形例）
以下では、本発明の実施形態２の変形例に係る音声符号化兼復号装置について述べる。装置の概要は、実施形態１に係る音声符号化兼復号装置１１１と同様である。 (Modification of Embodiment 2)
Hereinafter, a speech encoding / decoding device according to a modification of the second embodiment of the present invention will be described. The outline of the apparatus is the same as that of the speech encoding / decoding apparatus 111 according to the first embodiment.

中区画帯域最大値の変化を表す量として、実施形態２においては、差分を用いていた。これに対して、本変形例においては、比率を用いる。両者における処理の内容は、ほぼ同じである。 In the second embodiment, the difference is used as the amount representing the change of the maximum value of the middle zone band. On the other hand, the ratio is used in this modification. The contents of the processing in both are almost the same.

送信機が行う処理は、既に説明した図９及び図１０のフローチャートの一部を変更した処理である。すなわち、図９のステップＳ７７１において、最大差分MaxΔX_MAX(t)を最大比率MaxRaX_MAX(t)に置換し、ステップＳ７８３において、中区画帯域比率RaX_MAX(ω_RANGE、t)をRaX_MAX(ω_RANGE、t)=X_MAX(ω_RANGE、t)/X_MAX(ω_RANGE、t-Δt)により計算するよう変更し、ステップＳ７８５において、RaX_MAX(ω_RANGE、t)≧MaxRaX_MAX(t)であるか否かを判別するよう変更し、ステップＳ７８７において、MaxRaX_MAX(t)= RaX_MAX(ω_RANGE、t)に更新するよう変更し、ステップＳ７９３において、MaxRaX_MAX(t)を記憶部１２５に格納するよう変更し、図１０のステップＳ８１１において、最大比率MaxRaX_MAX(t)をロードするよう変更し、ステップＳ８１９において、規格化ＭＤＣＴ係数をX_REG(ω_RANGE、1、t)=X(ω_RANGE、1、t)/{X_MAX(ω_RANGE、t-Δt)×MaxRaX_MAX(t)}、・・・、X_REG(ω_RANGE、q(ω_RANGE)、t)=X(ω_RANGE、q(ω_RANGE)、t)/{X_MAX(ω_RANGE、t-Δt)×MaxRaX_MAX(t)}のように計算するよう変更する。 The process performed by the transmitter is a process obtained by changing a part of the flowcharts of FIGS. 9 and 10 described above. That is, in step S771 in FIG. 9, the maximum difference MaxΔX _MAX (t) is replaced with the maximum ratio MaxRaX _MAX (t), and in step S783, the medium partition band ratio RaX _MAX (ω _RANGE , t) is changed to RaX _MAX (ω _RANGE , T) = X _MAX (ω _RANGE , t) / X _MAX (ω _RANGE , t−Δt), and in step S785, RaX _MAX (ω _RANGE , t) ≧ MaxRaX _MAX (t). In step S787, it is changed to update to MaxRaX _MAX (t) = RaX _MAX (ω _RANGE , t). In step S793, MaxRaX _MAX (t) is stored in the storage unit 125. In step S811 of FIG. 10, the maximum ratio MaxRaX _MAX (t) is changed to be loaded. In step S819, the normalized MDCT coefficient is set to X _REG (ω _RANGE , 1, t) = X (ω _RANGE , 1, t) / {X _MAX (ω _RANGE , t-Δt) × MaxRaX _MAX (t)}, ..., X _REG (ω _RANGE , q (ω _RANGE ), t) = X (ω _RANGE , q (ω _RANGE ), t) / {X _MAX (ω _RANGE , t−Δt) × MaxRaX _MAX (t)}

受信機が行う処理は、既に説明した図１１のフローチャートの一部を変更した処理である。すなわち、図１１のステップＳ８３１において、最大比率MaxRaX_MAX(t)をロードするよう変更し、ステップＳ８３９において、ＭＤＣＴ係数をX(ω_RANGE、1、t)=X_REG(ω_RANGE、1、t)×{X_MAX(ω_RANGE、t-Δt)×MaxRaX_MAX(t)}、・・・、X(ω_RANGE、q(ω_RANGE)、t)=X_REG(ω_RANGE、q(ω_RANGE)、t)×{X_MAX(ω_RANGE、t-Δt)×MaxRaX_MAX(t)}のように計算するよう変更する。 The process performed by the receiver is a process obtained by changing a part of the flowchart shown in FIG. That is, in step S831 in FIG. 11, the maximum ratio MaxRaX _MAX (t) is changed to be loaded, and in step S839, the MDCT coefficient is set to X (ω _RANGE , 1, t) = X _REG (ω _RANGE , 1, t). × {X _MAX (ω _RANGE , t-Δt) × MaxRaX _MAX (t)}, ..., X (ω _RANGE , q (ω _RANGE ), t) = X _REG (ω _RANGE , q (ω _RANGE ), t) × {X _MAX (ω _RANGE , t−Δt) × MaxRaX _MAX (t)}.

比率については、中区画帯域全てについての比率ではなく最大比率MaxRaX_MAX(t)だけを符号化すればよい点で、本変形例によれば、実施形態２と同様の効果がある。加えて、次の効果もある。 With respect to the ratio, only the maximum ratio MaxRaX _MAX (t) needs to be encoded, not the ratio for all of the medium partition bands, and this modification has the same effect as that of the second embodiment. In addition, there are the following effects.

各中区画帯域の性質を該中区画帯域に含まれるＭＤＣＴ係数の最大値で代表させたようなスペクトルを想定する。すると、音声の特性ゆえに、かかるスペクトルは、時間とともに全帯域が底上げ又は底下げされるように変化するよりは、各中区画帯域の成分が時間とともに比例するように、つまりスペクトル全体としては相似なまま、変化する傾向が強い。よって、差分ではなく比率を用いてスペクトルの時間変化を表現する本変形例によれば、実施形態２に比べ、規格化のための除数が大きすぎるために符号化の精度が低下する度合いを、減少させることができる。 A spectrum is assumed in which the property of each middle section band is represented by the maximum value of the MDCT coefficient included in the middle section band. Then, due to the characteristics of speech, such a spectrum is similar so that the components of each mid-zone band are proportional with time rather than changing so that the entire band is raised or lowered with time. There is a strong tendency to change. Therefore, according to the present modification example that expresses the time change of the spectrum using the ratio instead of the difference, the degree to which the encoding accuracy is reduced because the divisor for normalization is too large compared to the second embodiment. Can be reduced.

なお、この発明は、上記実施形態に限定されず、種々の変形及び応用が可能である。上述のハードウェア構成やブロック構成、フローチャートは例示であって、限定されるものではない。 In addition, this invention is not limited to the said embodiment, A various deformation | transformation and application are possible. The above-described hardware configuration, block configuration, and flowchart are examples, and are not limited.

例えば、図１に示される音声符号化兼復号装置１１１として携帯電話を想定して説明したが、ＰＨＳ（Personal Handyphone System）や、ＰＤＡ（Personal Digital Assistants）、あるいは一般的なパーソナルコンピュータには、本発明を容易に適用することができる。すなわち、上記実施形態は説明のためのものであり、本願発明の範囲を制限するものではない。 For example, a mobile phone has been described as the speech encoding / decoding device 111 shown in FIG. 1, but the PHS (Personal Handyphone System), the PDA (Personal Digital Assistants), or a general personal computer has The invention can be easily applied. That is, the said embodiment is for description and does not restrict | limit the scope of the present invention.

本発明の実施の形態に係る音声符号化兼復号装置の構成を示す図である。It is a figure which shows the structure of the audio | voice encoding / decoding apparatus which concerns on embodiment of this invention. 音声スペクトルが、小区画帯域に属するＭＤＣＴ係数により表される様子を、模式的に示す図である。It is a figure which shows a mode that an audio | voice spectrum is represented by the MDCT coefficient which belongs to a small division zone | band. 本発明の実施形態１における、中区画帯域と、時刻の関数としてのＭＤＣＴ係数と、を模式的に示す図である。It is a figure which shows typically the medium division zone | band and the MDCT coefficient as a function of time in Embodiment 1 of this invention. 本発明の実施形態１における、音声符号化装置が行う演算の概略を示す図である。It is a figure which shows the outline of the calculation which the audio | voice coding apparatus in Embodiment 1 of this invention performs. 本発明の実施形態１における、音声符号化装置から音声復号装置へ伝達される情報の概略を示す図である。It is a figure which shows the outline of the information transmitted to the audio | voice decoding apparatus from the audio | voice encoding apparatus in Embodiment 1 of this invention. 本発明の実施形態１における、音声復号装置が行う演算の概略を示す図である。It is a figure which shows the outline of the calculation which the audio | voice decoding apparatus in Embodiment 1 of this invention performs. 本発明の実施形態１における、中区画帯域最大値検索、中区画帯域差分の計算、及び、規格化ＭＤＣＴ係数の計算の流れを示す図である。It is a figure which shows the flow of calculation of medium division zone maximum value search, middle division zone difference, and calculation of a normalization MDCT coefficient in Embodiment 1 of this invention. 本発明の実施形態１における、中区画帯域最大値及びＭＤＣＴ係数の計算の流れを示す図である。It is a figure which shows the flow of calculation of the middle division zone | band maximum value and MDCT coefficient in Embodiment 1 of this invention. 本発明の実施形態２における、中区画帯域最大値検索、中区画帯域差分の計算、及び、最大差分の計算の流れを示す図である。It is a figure which shows the flow of the calculation of the middle division band maximum value search, the calculation of a middle division band difference, and the calculation of a maximum difference in Embodiment 2 of this invention. 本発明の実施形態２における、最大差分使用時の規格化ＭＤＣＴ係数の計算の流れを示す図である。It is a figure which shows the flow of calculation of the normalization MDCT coefficient at the time of the largest difference use in Embodiment 2 of this invention. 本発明の実施形態２における、最大差分に基づく、中区画帯域最大値及びＭＤＣＴ係数の計算の流れを示す図である。It is a figure which shows the flow of calculation of the medium division zone | band maximum value and MDCT coefficient based on the maximum difference in Embodiment 2 of this invention.

Explanation of symbols

１１１・・・音声符号化兼復号装置、１２１・・・ＣＰＵ、１２３・・・ＲＯＭ、１２５・・・記憶部、１３１・・・ＲＡＭ、１３３・・・ハードディスク、１４１・・・音声処理部、１５１・・・マイクロフォン、１５３・・・スピーカ、１６１・・・無線通信部、１６３・・・アンテナ、１７１・・・操作キー入力内容処理部、１７３・・・操作キー、１８１・・・システムバス DESCRIPTION OF SYMBOLS 111 ... Speech encoding and decoding apparatus, 121 ... CPU, 123 ... ROM, 125 ... Memory | storage part, 131 ... RAM, 133 ... Hard disk, 141 ... Voice processing part, 151 ... Microphone, 153 ... Speaker, 161 ... Wireless communication unit, 163 ... Antenna, 171 ... Operation key input content processing unit, 173 ... Operation key, 181 ... System bus

Claims

Discrete spectrum conversion means for obtaining a value of a frequency component for each sub-band having a predetermined bandwidth for each predetermined time segment for the digital audio signal;
For each medium partition band composed of a predetermined number of consecutive sub-compartment bands set in advance according to auditory characteristics, for each predetermined time section, out of the frequency component values belonging to the medium-compartment band A maximum value search means for searching for a maximum value;
Maximum value storage means for storing the maximum value searched by the maximum value search means;
Encoding means for quantizing and entropy-encoding and outputting information generated based on the maximum value and a frequency component normalized by division by the maximum value;
With
The maximum value search means includes:
For each predetermined time segment, the current maximum value that is the maximum value searched in the predetermined time segment is stored in the maximum value storage means, and a predetermined time segment that is earlier in time than the predetermined time segment Obtaining the past maximum value, which is the maximum value stored in the maximum value storage means, from the maximum value storage means, and converting the current maximum value into a value associated with the past maximum value,
A speech encoding apparatus characterized by that.

A medium partition identification integer is assigned to the medium partition band in order from the low range, and the medium partition band is configured such that the logarithm of the center frequency of the medium partition band linearly depends on the medium partition identification integer. Further comprising a partition band configuration means,
The speech encoding apparatus according to claim 1.

The maximum value search means includes:
Find a difference that is a value obtained by subtracting the past maximum value from the current maximum value,
The encoding means includes
The difference and the normalized frequency component are quantized and entropy encoded and output.
The speech encoding apparatus according to claim 1 or 2, characterized in that

The maximum value search means includes:
Obtain a ratio that is a value obtained by dividing the current maximum value by the past maximum value,
The encoding means includes
The ratio and the normalized frequency component are quantized and entropy encoded and output.
The speech encoding apparatus according to claim 1 or 2, characterized in that

A maximum difference determining means;
The maximum value search means includes:
Find a difference that is a value obtained by subtracting the past maximum value from the current maximum value,
The maximum difference determining means includes
Find the maximum difference that is the maximum value of the differences obtained by the maximum value search means for every medium partition band,
The encoding means includes
The maximum difference and the normalized frequency component are quantized and entropy encoded and output.
The speech encoding apparatus according to claim 1 or 2, characterized in that

A maximum ratio determining means;
The maximum value search means includes:
Obtain a ratio that is a value obtained by dividing the current maximum value by the past maximum value,
The maximum ratio determining means includes
The maximum value search means calculates a maximum ratio that is the maximum value among the ratios determined for all the medium partition bands,
The encoding means includes
The maximum ratio and the normalized frequency component are quantized and entropy encoded and output.
The speech encoding apparatus according to claim 1 or 2, characterized in that

The discrete spectrum conversion means includes
Use MDCT (Modified Discrete Cosine Transform),
The speech encoding apparatus according to claim 1, wherein the speech encoding apparatus is a part of the speech encoding apparatus.

Entropy coding is performed on the quantized spectrum of the audio signal for each predetermined time segment, which is a result of performing deformation including normalization for each band, and the normalization value that is a value used for the normalization. Receiving means for receiving the code generated by
Decoding means for decoding the modified spectrum data and the normalization value for each of the predetermined time intervals from the code by a decoding method corresponding to the entropy encoding;
Inverse deformation means for restoring the quantized spectrum for each of the predetermined time segments using the standardized value decoded from the deformed spectrum data decoded;
Normalization value storage means for storing the normalization value;
Discrete spectrum inverse transform means for restoring the speech signal from the restored quantized spectrum;
With
The reverse deformation means includes
For each predetermined time segment, the current standardization value, which is a standardization value decoded in the predetermined time segment, is stored in the standardization value storage means, and more time-dependent than the predetermined time segment. A past standardization value that is a standardization value stored in the standardization value storage means in a predetermined past time period, and based on the current standardization value and the past standardization value Restore the quantized spectrum,
A speech decoding apparatus characterized by that.

For a digital audio signal, for each predetermined time segment, a discrete spectrum conversion step for obtaining a value of a frequency component for each sub-compartment band having a predetermined bandwidth;
For each medium partition band composed of a predetermined number of consecutive sub-compartment bands set in advance according to auditory characteristics, for each predetermined time section, out of the frequency component values belonging to the medium-compartment band A maximum value search step for searching for a maximum value;
A maximum value storing step for storing the maximum value searched by the maximum value searching step;
An encoding step of quantizing and entropy-encoding and outputting information generated based on the maximum value and a frequency component normalized by division by the maximum value;
Consisting of
The maximum value search step includes:
For each predetermined time segment, when the current maximum value, which is the maximum value searched in the predetermined time segment, is stored by the maximum value storing step, a predetermined past in time than the predetermined time segment is stored. Obtain the past maximum value that is the maximum value stored in the past maximum value storage step in the time segment, and convert the current maximum value to a value associated with the past maximum value,
A speech encoding method characterized by the above.

Entropy coding is performed on the quantized spectrum of the audio signal for each predetermined time segment, which is a result of performing deformation including normalization for each band, and the normalization value that is a value used for the normalization. A receiving step for receiving a code generated by
A decoding step of decoding the modified spectrum data and the normalization value for each of the predetermined time intervals from the code by a decoding method corresponding to the entropy encoding;
An inverse transformation step of restoring the quantized spectrum for each predetermined time segment using the standardized value decoded from the decoded spectral data,
A normalization value storing step for storing the normalization value;
A discrete spectrum inverse transform step of restoring the speech signal from the restored quantized spectrum;
Consisting of
The reverse deformation step includes
For each predetermined time segment, when the current standardization value, which is a standardization value decoded in the predetermined time segment, is stored by the standardization value storage step, A past standardization value that is a standardization value stored in the past standardization value storage step in a predetermined time segment in the past is acquired, and based on the current standardization value and the past standardization value To restore the quantized spectrum,
A speech decoding method characterized by the above.

On the computer,
For a digital audio signal, for each predetermined time segment, a discrete spectrum conversion step for obtaining a value of a frequency component for each sub-compartment band having a predetermined bandwidth;
For each medium partition band composed of a predetermined number of consecutive sub-compartment bands set in advance according to auditory characteristics, for each predetermined time section, out of the frequency component values belonging to the medium-compartment band A maximum value search step for searching for a maximum value;
A maximum value storing step for storing the maximum value searched by the maximum value searching step;
An encoding step of quantizing and entropy-encoding and outputting information generated based on the maximum value and a frequency component normalized by division by the maximum value;
A program for executing
The maximum value search step includes:
For each predetermined time segment, when the current maximum value, which is the maximum value searched in the predetermined time segment, is stored by the maximum value storing step, a predetermined past in time than the predetermined time segment is stored. Obtain the past maximum value that is the maximum value stored in the past maximum value storage step in the time segment, and convert the current maximum value to a value associated with the past maximum value,
A program characterized by that.

On the computer,
Entropy coding is performed on the quantized spectrum of the audio signal for each predetermined time segment, which is a result of performing deformation including normalization for each band, and the normalization value that is a value used for the normalization. A receiving step for receiving a code generated by
A decoding step of decoding the modified spectrum data and the normalization value for each of the predetermined time intervals from the code by a decoding method corresponding to the entropy encoding;
An inverse transformation step of restoring the quantized spectrum for each predetermined time segment using the standardized value decoded from the decoded spectral data,
A normalization value storing step for storing the normalization value;
A discrete spectrum inverse transform step of restoring the speech signal from the restored quantized spectrum;
A program for executing
The reverse deformation step includes
For each predetermined time segment, when the current standardization value, which is a standardization value decoded in the predetermined time segment, is stored by the standardization value storage step, A past standardization value that is a standardization value stored in the past standardization value storage step in a predetermined time segment in the past is acquired, and based on the current standardization value and the past standardization value To restore the quantized spectrum,
A program characterized by that.