JPH0775339B2

JPH0775339B2 - Speech coding method and apparatus

Info

Publication number: JPH0775339B2
Application number: JP4305234A
Authority: JP
Inventors: 功手嶋
Original assignee: 株式会社小電力高速通信研究所
Priority date: 1992-11-16
Filing date: 1992-11-16
Publication date: 1995-08-09
Anticipated expiration: 2010-08-09
Also published as: JPH06164520A

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、音声通信を行なう際
に、音声信号をディジタル信号の形式に変換して伝送す
る場合において、情報量を削減することによって伝送速
度を低速化して狭帯域化を図り伝送帯域の有効利用を図
る音声符号化方法及び装置に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention reduces the amount of information in the case of converting a voice signal into a digital signal format for transmission during voice communication and transmitting it, thereby lowering the transmission speed and narrowing the band. The present invention relates to a speech coding method and apparatus for achieving efficient use of transmission band by utilizing the above method.

【０００２】[0002]

【従来の技術】従来、離散化された音声信号を処理単位
（フレーム）に区切り、窓かけによって重み付けを行な
い、３／４フレームを重ね合わせて高速フーリエ変換
（ＦＦＴ）によって時間領域から周波数領域へ直交変換
し、人間の聴覚特性の一つであるメル尺度に従って物理
的な周波数［Ｈｚ］からメル尺度で表される周波数［ｍ
ｅｌ］に置換する場合において、メル周波数軸で４０個
程度の帯域に等分割しそれに対応する物理周波数軸を分
割し、分割された物理周波数軸の帯域に存在する複数の
スペクトル情報からその帯域の代表スペクトル情報すな
わちメル尺度化スペクトル情報を抽出し、ＡＤＰＣＭに
より符号化する音声符号化方式があった。しかし、この
方式では圧縮率が低く伝送帯域の有効利用は期待できな
い。2. Description of the Related Art Conventionally, a discretized audio signal is divided into processing units (frames), weighted by windowing, 3/4 frames are superposed, and a fast Fourier transform (FFT) is performed to change them from a time domain to a frequency domain. Orthogonal transformation is performed, and according to the Mel scale, which is one of the human auditory characteristics, the frequency [m] expressed from the physical frequency [Hz] to the Mel scale
[el], the mel frequency axis is equally divided into about 40 bands, the corresponding physical frequency axis is divided, and a plurality of spectrum information existing in the divided physical frequency axis bands There is a speech coding method in which representative spectrum information, that is, Mel scaled spectrum information is extracted and coded by ADPCM. However, with this method, the compression rate is low and effective use of the transmission band cannot be expected.

【０００３】そこで、メル尺度化スペクトル情報のＬＯ
Ｇをとり再度ＦＦＴとすることによりメル尺度化ケプス
トラムを算出しその低次成分（スペクトル包洛情報）を
ベクトル量子化する方法が考え出され大幅な圧縮が可能
となった。しかし、この方式ではベクトル量子化に要す
る処理が膨大であり実現は非常に困難であった。Therefore, the LO of the mel-scaled spectral information is
A method of calculating the mel-scaled cepstrum by taking G and again using FFT and vector-quantizing the low-order component (spectral envelopment information) has been devised, and it has become possible to significantly compress. However, this method is very difficult to realize because the amount of processing required for vector quantization is huge.

【０００４】[0004]

【発明が解決しようとする課題】本発明は上記の事情に
鑑みてなされたもので、比較的圧縮率が高く実現が容易
な音声符号化方法及び装置を提供することを目的とす
る。SUMMARY OF THE INVENTION The present invention has been made in view of the above circumstances, and it is an object of the present invention to provide a speech coding method and apparatus which have a relatively high compression rate and are easy to realize.

【０００５】[0005]

【課題を解決するための手段】本発明は上記課題を解決
するために、離散化された音声信号を処理単位に区切
り、高速フーリエ変換によって時間領域から周波数領域
へ直交変換し、メル尺度に従って物理的な周波数からメ
ル尺度で表される周波数に置換する場合において、メル
周波数軸で複数の帯域に等分割しそれに対応する物理周
波数軸を分割し、分割された物理周波数軸の帯域に存在
する複数のスペクトル情報からその帯域の最大振幅スペ
クトルの位置情報とその帯域の合成スペクトル電力から
なるメル尺度化スペクトル情報を抽出し、符号化して情
報量を削減することを特徴とする音声符号化方法であ
る。In order to solve the above problems, the present invention divides a discretized audio signal into processing units, performs orthogonal transform from the time domain to the frequency domain by fast Fourier transform, and performs physical transformation according to the Mel scale. In the case of replacing a specific frequency with a frequency represented by a mel scale, the mel frequency axis is equally divided into a plurality of bands, the corresponding physical frequency axis is divided, and a plurality of divided physical frequency axis bands exist. Of the maximum amplitude spectrum of the band and mel-scaled spectrum information composed of the combined spectrum power of the band is extracted from the spectrum information of the band, and is coded to reduce the amount of information. .

【０００６】また、上記音声符号化方法において、離散
化されフレーム化された音声信号を高速フーリエ変換を
用いて時間領域から周波数領域に変換する場合、複素形
式のスペクトル情報を電力スペクトル情報の形式で表す
ことで位相情報を無視して伝送パラメータの削減を図
り、復号時においては電力スペクトル情報を虚数部に配
置して実数部は０として高速逆フーリエ変換を行ない時
間軸に再変換した際に正弦波の合成となるようにしたこ
とを特徴とするものである。Further, in the above speech coding method, when transforming the discretized and framed speech signal from the time domain to the frequency domain by using the fast Fourier transform, the complex spectrum information is converted into the power spectrum information format. By expressing it, the phase information is ignored and the transmission parameters are reduced. At the time of decoding, the power spectrum information is arranged in the imaginary part, the real part is set to 0, and the fast inverse Fourier transform is performed. It is characterized in that the wave is synthesized.

【０００７】また、上記音声符号化方法において、メル
尺度化スペクトル情報中で絶対値が最大のものを選び最
大値情報とし、その最大値情報と他のメル尺度化スペク
トル情報の絶対値を比較し、その差が最大値の１／２以
上であれば００，１／４以上１／２未満であれば０１，
１／８以上１／４未満であれば１０，１／８未満であれ
ば１１と、４段階に２ビットで符号化したことを特徴と
するものである。Further, in the above speech coding method, the one having the largest absolute value is selected from the mel-scaled spectrum information as the maximum value information, and the maximum value information is compared with the absolute value of other mel-scaled spectrum information. , If the difference is ½ or more of the maximum value, 00, if ¼ or more and less than 1/2, 01,
It is characterized in that 2 bits are coded in 4 steps, that is, if it is 1/8 or more and less than 1/4, it is 10, and if it is less than 1/8, it is 11.

【０００８】また、音声信号をディジタルデータとする
Ａ／Ｄ変換器と、このＡ／Ｄ変換器から出力されたディ
ジタルデータを処理単位に分割された信号とする第１の
バッファと、この第１のバッファから出力されたデータ
に重み付けを行なう窓かけ器と、この窓かけ器から出力
されたデータを時間領域から周波数領域のスペクトル情
報に変換する高速フーリエ変換器と、この高速フーリエ
変換器から出力されたスペクトル情報を物理的な周波数
軸からメルで表現されるメル周波数軸に変換してメル尺
度化スペクトル情報を得るメル尺度化器と、このメル尺
度化器から出力されたメル尺度化スペクトル情報からメ
ル尺度化スペクトル情報中で絶対値が最大の最大値情報
及び符号化メル尺度化スペクトル情報を得る電力スペク
トル符号化器と、この電力スペクトル符号化器から抽出
された最大値情報，符号化メル尺度化スペクトル情報，
及び前記メル尺度化器から出力されたメル尺度化スペク
トルの位置情報を多重化してディジタルデータとして伝
送する多重化器と、この多重化器から伝送されたディジ
タルデータを多重分離してメル尺度化スペクトル情報中
で絶対値が最大の最大値情報，符号化メル尺度化スペク
トル情報，及びメル尺度化スペクトルの位置情報を得る
多重分離器と、この多重分離器から出力された最大値情
報及び符号化メル尺度化スペクトル情報からメル尺度化
スペクトル情報を復号化する電力スペクトル復号器と、
この電力スペクトル復号器から出力されたメル尺度化ス
ペクトル情報と前記多重分離器から出力されたメル尺度
化スペクトルの位置情報とから物理的な周波数軸上のス
ペクトル情報に復号化するメル尺度復号器と、このメル
尺度復号器から出力されたスペクトル情報を周波数領域
から時間領域のディジタルデータに変換する高速逆フー
リエ変換器と、この高速逆フーリエ変換器から出力され
たディジタルデータを第２のバッファを介してアナログ
の音声信号に変換するＤ／Ａ変換器とを具備することを
特徴とするものである。Also, an A / D converter that converts a voice signal into digital data, a first buffer that converts the digital data output from the A / D converter into processing units, and the first buffer Windower for weighting the data output from the buffer, the fast Fourier transformer for converting the data output from this windower into the spectrum information in the frequency domain, and the output from this fast Fourier transformer Mel scaler that obtains mel-scaled spectrum information by converting the generated spectral information from the physical frequency axis to the mel frequency axis represented by mel, and the mel-scaled spectrum information output from this mel-scaler A power spectrum encoder that obtains maximum value information and encoded mel-scaled spectrum information having the largest absolute value in the mel-scaled spectrum information from Max information, coding mel scale of spectral information extracted from the power spectrum encoder,
And a multiplexer for multiplexing position information of the mel-scaled spectrum output from the mel-scaler and transmitting it as digital data, and a mel-scaled spectrum for demultiplexing the digital data transmitted from the multiplexer. A demultiplexer for obtaining maximum value information having the largest absolute value in information, coded mel-scaled spectrum information, and position information of mel-scaled spectrum, and maximum value information and coded mel output from the demultiplexer. A power spectrum decoder for decoding Mel scaled spectrum information from the scaled spectrum information,
A mel-scale decoder that decodes mel-scaled spectrum information output from the power spectrum decoder and position information of the mel-scaled spectrum output from the demultiplexer into spectrum information on a physical frequency axis, and , A fast inverse Fourier transformer that transforms the spectral information output from the mel scale decoder into digital data in the frequency domain from the frequency domain, and the digital data output from the fast inverse Fourier transformer via a second buffer. And a D / A converter for converting into an analog audio signal.

【０００９】[0009]

【作用】本発明は上記手段により、周波数領域における
波形符号化の一種で人間の聴覚特性を利用して圧縮を図
るものであり、音声信号を周波数領域に変換し、人間の
聴覚特性の一つであるメル尺度に従って物理的な周波数
をメル尺度で表現されるメル周波数軸に置換する場合に
おいて複素スペクトルの位相情報を無視しさらにメル尺
度化スペクトル情報を２ビットで符号化することにより
大幅な情報量削減をはかることを特徴とするものであ
る。According to the present invention, the above means is one of the waveform coding in the frequency domain for the purpose of compressing by utilizing the human auditory characteristic. The audio signal is converted into the frequency domain and one of the human auditory characteristic is obtained. By replacing the physical frequency according to the Mel scale with the Mel frequency axis represented by the Mel scale, the phase information of the complex spectrum is ignored, and the Mel scaled spectrum information is encoded by 2 bits to obtain a large amount of information. The feature is that the amount is reduced.

【００１０】[0010]

【実施例】以下図面を参照して本発明の実施例を詳細に
説明する。図１は本発明の一実施例の回路構成を示す。
図において２０は音声符号化器、２１は音声復号器であ
る。まず音声符号化処理について説明する。Embodiments of the present invention will now be described in detail with reference to the drawings. FIG. 1 shows a circuit configuration of an embodiment of the present invention.
In the figure, 20 is a speech encoder, and 21 is a speech decoder. First, the voice encoding process will be described.

【００１１】連続的な音声信号Ｓ１をＡ／Ｄ（アナログ
／ディジタル）変換器１によって標本化、量子化し離散
信号Ｓ２とする。次にバッファ２を介し離散信号Ｓ２を
処理単位（フレーム）に分割された信号Ｓ３とする。こ
のフレームの長さは音声信号の定常性を考慮し２０〜４
０ｍｓｅｃ程度とする。次に周波数領域に変換する際に
分析精度を向上させるため信号Ｓ３に対し適当な窓かけ
器３により重み付けを行なう。窓かけの方法としてはハ
ミング窓やハニング窓等が挙げられる。次に窓かけされ
た処理単位の信号Ｓ４を周波数領域に変換する。変換の
方法としては分解能が高く高速に実行できる方法が望ま
しいため高速フーリエ変換（ＦＦＴ）器４を用いる。Ｆ
ＦＴの分析ポイント数は分析時間と分解能のトレードオ
フから２５６ポイント程度とする。２５６ポイントＦＦ
Ｔを行なった場合１２８本の複素スペクトル情報が得ら
れる。このスペクトル情報の間隔は、例えば標本化周波
数８ｋＨｚで１フレーム３２ｍＳの場合３１．２５Ｈｚ
となる。次にＦＦＴ器４によって算出されたスペクトル
情報Ｓ５，Ｓ６を物理的な周波数軸からメルで表現され
るメル周波数軸にメル尺度化器５によって置換する。メ
ル尺度は人間の聴覚特性において音の高さの感覚が、図
２に示すように物理的な周波数に対して線形でなく、ｆ_m＝１０００ｌｏｇ₂（１＋ｆ／１０００）ｆ：物理的な周波数［Ｈｚ］ｆ_m：メル周波数［ｍｅｌ］で近似される。A continuous audio signal S1 is sampled and quantized by an A / D (analog / digital) converter 1 to form a discrete signal S2. Next, the discrete signal S2 is converted into a signal S3 divided into processing units (frames) via the buffer 2. The length of this frame is 20 to 4 considering the steadiness of the audio signal.
It is about 0 msec. Next, the signal S3 is weighted by an appropriate windowing device 3 in order to improve the analysis accuracy when converting into the frequency domain. Examples of the windowing method include a Hamming window and a Hanning window. Next, the windowed processing unit signal S4 is transformed into the frequency domain. A fast Fourier transform (FFT) device 4 is used because a method having a high resolution and a high-speed execution is desirable as the conversion method. F
The number of FT analysis points is set to about 256 points from the trade-off between analysis time and resolution. 256 point FF
When T is performed, 128 pieces of complex spectrum information are obtained. The interval of the spectrum information is, for example, 31.25 Hz when the sampling frequency is 8 kHz and one frame is 32 mS.
Becomes Next, the mel scaler 5 replaces the spectrum information S5 and S6 calculated by the FFT unit 4 with the mel frequency axis represented by mel from the physical frequency axis. According to the Mel scale, the perception of pitch is not linear with the physical frequency as shown in FIG. 2 in the human auditory characteristic, and f _m = 1000 log ₂ (1 + f / 1000) f: physical frequency [ Hz] f _m: is approximated by Mel frequency [mel].

【００１２】メル尺度を利用して圧縮を図るための具体
的手段を説明する。まずメル周波数軸を等分割する。こ
の分割数が少ないほど圧縮率は高くなる。次に図３に示
すようにメル周波数軸で分割された区間に対応する物理
周波数軸を分割する。分割された物理周波数軸のそれぞ
れの区間から代表スペクトル（メル尺度化スペクトル）
情報を算出する。メル尺度化スペクトル情報の算出方法
を図４を用いて説明する。まずＦＦＴ器４によって算出
された複素形式のスペクトル情報Ｓ５，Ｓ６を電力スペ
クトル算出器５０によって電力で表し位相角を削除した
スペクトル情報Ｓ７とする。これは人間の聴覚が瞬時位
相に対して鈍感であり、位相を無視しても音声情報の認
識には影響を及ぼさないためである。次にメル尺度分割
器５１でスペクトル情報Ｓ７を分割しメル帯域スペクト
ル情報Ｓ８を得る。次にスペクトル情報Ｓ８を最大値位
置検出器５２によりメル帯域内の最大振幅スペクトルの
位置情報Ｓ９を検出する。次に合成器５３によりメル帯
域の合成スペクトル電力を求める。この合成スペクトル
電力がメル尺度化スペクトル情報Ｓ１０となる。上記の
例として、１５分割する場合には１２８本の複素スペク
トル情報が１５本のメル尺度化スペクトル情報と１５の
位置情報で表されることになりこの時点で情報量は約１
／４に圧縮される。次に電力スペクトル符号化器６によ
りメル尺度化スペクトル情報Ｓ１０を符号化する。この
符号化器６の詳細を説明する。まず、メル尺度化スペク
トル情報中で絶対値が最大のものを抽出し最大値情報Ｓ
１１とする。次にその最大値情報とメル尺度化スペクト
ル情報の絶対値を比較し、その差が最大値の１／２以上
であれば００，１／４以上１／２未満であれば０１，１
／８以上１／４未満であれば１０，１／８未満であれば
１１と、４段階に２ｂｉｔで符号化し大幅な情報量削減
を行なった符号化メル尺度化スペクトル情報Ｓ１２を得
る。例として１５分割の場合この時点では元の情報の約
１／２５に圧縮される。最後に最大値情報Ｓ１１、符号
化メル尺度化スペクトル情報（×分割数）Ｓ１２、メル
尺度化スペクトルの位置情報（×分割数）Ｓ９を多重化
器７により多重化してディジタルデータＳ１３として伝
送する。Specific means for achieving compression using the Mel scale will be described. First, the Mel frequency axis is equally divided. The smaller the number of divisions, the higher the compression rate. Next, as shown in FIG. 3, the physical frequency axis corresponding to the section divided by the mel frequency axis is divided. Representative spectrum (mel-scaled spectrum) from each section of the divided physical frequency axis
Calculate information. A method of calculating mel-scaled spectrum information will be described with reference to FIG. First, the spectrum information S5 and S6 in the complex format calculated by the FFT unit 4 are represented by power by the power spectrum calculator 50, and the spectrum information S7 is obtained by deleting the phase angle. This is because the human sense of hearing is insensitive to the instantaneous phase, and ignoring the phase does not affect the recognition of voice information. Next, the mel scale divider 51 divides the spectrum information S7 to obtain mel band spectrum information S8. Then, the maximum value position detector 52 detects the position information S9 of the maximum amplitude spectrum in the mel band of the spectrum information S8. Next, the combiner 53 obtains the combined spectrum power in the mel band. This combined spectrum power becomes Mel scaled spectrum information S10. As an example of the above, in the case of 15 divisions, 128 pieces of complex spectrum information are represented by 15 pieces of mel-scaled spectrum information and 15 pieces of position information, and the amount of information is about 1 at this point.
Compressed to / 4. Next, the power spectrum encoder 6 encodes the mel-scaled spectrum information S10. Details of the encoder 6 will be described. First, the maximum absolute value information S is extracted by extracting the maximum absolute value from the Mel scaled spectrum information.
11 Next, the maximum value information and the absolute value of the mel-scaled spectrum information are compared. If the difference is ½ or more of the maximum value, 00, ¼ or more and less than 1/2, 01, 1
If / 8 or more and less than ¼, it is 10, if less than ⅛, it is 11, and encoded mel-scaled spectrum information S12 is obtained by performing 2-bit encoding in four steps and greatly reducing the information amount. For example, in the case of 15 divisions, at this point, the information is compressed to about 1/25. Finally, the maximum value information S11, the encoded mel-scaled spectrum information (x division number) S12, and the mel-scaled spectrum position information (x division number) S9 are multiplexed by the multiplexer 7 and transmitted as digital data S13.

【００１３】次に音声復号器２１の説明をする。まず受
信されたディジタルデータＳ１３を多重分離器８により
多重分離を行なう。次に電力スペクトル復号器９により
最大値情報Ｓ１１を基準として符号化メル尺度化スペク
トル情報Ｓ１２を最大値の１／１，１／２，１／４，１
／８で表現することで復号しメル尺度化スペクトル情報
Ｓ１４を得る。次に得られたメル尺度化スペクトル情報
Ｓ１４と位置情報Ｓ９をメル尺度復号器１０により符号
化時に分割された帯域内の最大振幅スペクトル情報があ
った位置に配置することで元の物理周波数軸上に復元す
る。この操作により、音声信号のピッチ構造と抑揚はほ
ぼ保存される。次に高速逆フーリエ変換（ＩＦＦＴ）器
１１で変換を行ない周波数領域から時間領域に変換す
る。この際、スペクトルは本来複素形式で位相角を持っ
ているはずだが符号化時に述べたように瞬時位相を無視
しそのスペクトルの電力値で表現している為、時間領域
に再変換した際にフレーム端でスムーズに接続されるよ
うにスペクトル情報は虚数部Ｓ１６に配置し実数部Ｓ１
５は全て０とする。この結果ＩＦＦＴ器１１により時間
軸に再変換されたデータＳ１７は正弦波の合成となりフ
レーム先端は全て振幅が０から始まり終端で０で終わり
前後のフレームはスムーズに接続される。このため符号
化時にフレーム分割する際にフレーム接続を考慮したオ
ーバーラップ処理を行なう必要はないことがわかる。次
にバッファ１２を介しディジタルデータＳ１８をＤ／Ａ
変換器１３によりＤ／Ａ変換しアナログの復号音声信号
Ｓ１９を得て処理を終える。Next, the speech decoder 21 will be described. First, the received digital data S13 is demultiplexed by the demultiplexer 8. Next, the power spectrum decoder 9 sets the encoded mel-scaled spectrum information S12 based on the maximum value information S11 as 1/1, 1/2, 1/4, 1 of the maximum value.
Decoding is performed by expressing / 8 to obtain Mel scaled spectrum information S14. Next, the mel-scaled spectrum information S14 and the position information S9 obtained are arranged on the original physical frequency axis by arranging them at the position where the maximum amplitude spectrum information within the band divided by the mel-scale decoder 10 at the time of encoding was present. Restore to. By this operation, the pitch structure and intonation of the voice signal are almost preserved. Next, the fast inverse Fourier transform (IFFT) device 11 performs the conversion to convert the frequency domain to the time domain. At this time, the spectrum should originally have a phase angle in a complex format, but since it was expressed by the power value of that spectrum ignoring the instantaneous phase as described at the time of encoding, the frame when reconverted to the time domain The spectral information is arranged in the imaginary part S16 so that the smooth connection is made at the end, and the real part S1
5 is all 0. As a result, the data S17 re-converted to the time axis by the IFFT unit 11 becomes a sine wave composite, and the frame front end all starts from 0 in amplitude and ends in 0, and the frames before and after are smoothly connected. Therefore, it is understood that it is not necessary to perform overlap processing considering frame connection when dividing a frame during encoding. Next, the digital data S18 is transferred to the D / A via the buffer 12.
The converter 13 performs D / A conversion to obtain an analog decoded voice signal S19 and ends the processing.

【００１４】[0014]

【発明の効果】以上述べたように本発明によれば、音声
通信を行なう際に、音声信号をディジタル信号の形式に
変換して伝送する場合において、人間の聴覚特性の一つ
であるメル尺度を利用することによってスペクトル情報
を圧縮し、位相情報を無視し、伝送パラメータを２ビッ
トで量子化することによって大幅に情報量を削減できる
ので実用上の効果は大きい。As described above, according to the present invention, in voice communication, when the voice signal is converted into a digital signal and transmitted, the Mel scale, which is one of the human auditory characteristics, is used. Is used, the spectrum information is compressed, the phase information is ignored, and the transmission parameter is quantized by 2 bits, so that the amount of information can be significantly reduced.

[Brief description of drawings]

【図１】本発明の一実施例を示す構成説明図である。FIG. 1 is a structural explanatory view showing an embodiment of the present invention.

【図２】本発明に係るメル尺度と物理周波数の関係の一
例を示す特性図である。FIG. 2 is a characteristic diagram showing an example of a relationship between a mel scale and a physical frequency according to the present invention.

【図３】本発明に係るメル尺度軸の分割に対応した物理
周波数軸の分割の一例を示す特性図である。FIG. 3 is a characteristic diagram showing an example of division of a physical frequency axis corresponding to division of a mel scale axis according to the present invention.

【図４】本発明に係るメル尺度化処理の一例を示す構成
説明図である。FIG. 4 is a structural explanatory view showing an example of a mel scaling processing according to the present invention.

[Explanation of symbols]

１…Ａ／Ｄ変換器、２，１２…バッファ、３…窓かけ
器、４…ＦＦＴ器、５…メル尺度化器、６…符号化器、
７…多重化器、８…多重分離器、９…復号器、１０…メ
ル尺度復号器、１１…ＩＦＦＴ器、１３…Ｄ／Ａ変換
器、２０…音声符号化器、２１…音声復号器、５０…電
力スペクトル算出器、５１…分割器、５２…最大値位置
検出器、５３…合成器。1 ... A / D converter, 2, 12 ... Buffer, 3 ... Window device, 4 ... FFT device, 5 ... Mel scaler, 6 ... Encoder,
7 ... Multiplexer, 8 ... Demultiplexer, 9 ... Decoder, 10 ... Mel scale decoder, 11 ... IFFT device, 13 ... D / A converter, 20 ... Speech encoder, 21 ... Speech decoder, 50 ... Power spectrum calculator, 51 ... Divider, 52 ... Maximum value position detector, 53 ... Combiner

Claims

[Claims]

1. A case where a discretized audio signal is divided into processing units, orthogonal transformation is performed from a time domain to a frequency domain by a fast Fourier transform, and a physical frequency is replaced with a frequency represented by a mel scale according to the mel scale. In, in the mel frequency axis is divided into a plurality of bands on the physical frequency axis corresponding to it, the position information of the maximum amplitude spectrum of that band from the plurality of spectrum information existing in the band of the divided physical frequency axis and its A speech coding method characterized in that mel-scaled spectral information composed of band combined spectral power is extracted and coded to reduce the amount of information.

2. When transforming a discretized and framed speech signal from the time domain to the frequency domain by using the fast Fourier transform, the spectrum information in the complex format is represented in the format of the power spectrum information and the phase information is ignored. In order to reduce the transmission parameters, the power spectrum information is placed in the imaginary part at the time of decoding, the real part is set to 0, and the fast inverse Fourier transform is performed so that the sine wave is synthesized when re-transformed to the time axis. The speech coding method according to claim 1, wherein

3. The mel-scaled spectrum information having the largest absolute value is selected as maximum value information, and the maximum value information is compared with the absolute values of other mel-scaled spectrum information, and the difference is the maximum value. 00 if 1/2 or more, 1/2 or more 1/2
01 if less than 1 and 1 if 1/8 or more and less than 1/4
2. The speech coding method according to claim 1, wherein if the value is less than 0, 1/8, it is 11, and the coding is performed with 2 bits in 4 steps.

4. A / A in which a voice signal is digital data
A D converter, a first buffer that uses the digital data output from the A / D converter as a signal divided into processing units, and a windower that weights the data output from the first buffer. And a fast Fourier transformer that transforms the data output from this window multiplier into spectrum information in the time domain from the time domain, and the spectrum information output from this fast Fourier transformer is expressed in mel from the physical frequency axis. The mel scaler that obtains the mel-scaled spectrum information by converting it to the mel-scaled spectral axis that is output from this mel-scaled spectrum information has the largest absolute value in the mel-scaled spectrum information. A power spectrum coder for obtaining value information and coded mel-scaled spectrum information, and maximum value information and code extracted from the power spectrum coder. Mel-scaled spectrum information and a multiplexer for multiplexing position information of the mel-scaled spectrum output from the mel-scaler and transmitting it as digital data, and the digital data transmitted from this multiplexer. A demultiplexer that separates and obtains the maximum value information having the largest absolute value in the mel-scaled spectrum information, the encoded mel-scaled spectrum information, and the position information of the mel-scaled spectrum, and the output from this demultiplexer A power spectrum decoder for decoding the mel-scaled spectrum information from the maximum value information and the encoded mel-scaled spectrum information, the mel-scaled spectrum information output from the power spectrum decoder and the demultiplexer output Mel scale decoding for decoding from position information of mel scaled spectrum to spectrum information on physical frequency axis And a fast inverse Fourier transformer that transforms the spectrum information output from the Mel scale decoder into digital data in the frequency domain from the frequency domain, and the digital data output from the fast inverse Fourier transformer in a second buffer. A D / A converter for converting the signal into an analog speech signal via the speech coding apparatus.