JP3050978B2

JP3050978B2 - Audio coding method

Info

Publication number: JP3050978B2
Application number: JP3335009A
Authority: JP
Inventors: 浩桂川; 賢一郎細田; 弘美青柳; 義博有山
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1991-12-18
Filing date: 1991-12-18
Publication date: 2000-06-12
Anticipated expiration: 2015-06-12
Also published as: JPH05165500A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】この発明は、音声信号の圧縮符号
化方法に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a method for compressing and encoding an audio signal.

【０００２】[0002]

【従来の技術】８ｋビット／秒以下の圧縮率での、音声
信号の高能率符号化方法は、Ａｔａｌ等による、コード
励振線形予測符号化方式（以下ＣＥＬＰ）が有効な手法
である。これは音声信号を声道のパラメータと、励振源
のパラメータとによって表現するものであり、また、励
振源のパラメータを統計コードブックと適応コードブッ
クの２つによってベクトル量子化することについては、
次の文献に開示されている。2. Description of the Related Art A high-efficiency encoding method for audio signals at a compression rate of 8 kbits / sec or less is a code-excited linear prediction encoding method (hereinafter referred to as CELP) by Atal et al. This expresses a voice signal by parameters of a vocal tract and parameters of an excitation source. Regarding vector quantization of parameters of an excitation source by two of a statistical codebook and an adaptive codebook,
It is disclosed in the following document.

【０００３】文献名：N.S.Jayant & J.H.Chen,“Speech
Coding with Time-Varying Bit Allocations to Excit
ation and LPC parameters",Proc,ICASSP-89,(1989)Reference: NSJayant & JHChen, “Speech
Coding with Time-Varying Bit Allocations to Excit
ation and LPC parameters ", Proc, ICASSP-89, (1989)

【０００４】[0004]

【発明が解決しようとする課題】音声信号は、定常的な
有声音の区間と、過渡的な無声音の区間に分けて考える
ことができ、両者はその統計的な性質は大きく異なって
いる。ＣＥＬＰによる圧縮符号化に関しても、有声音で
は適応コードブックによる音質に対する寄与が非常に大
きいのに対し、無声音ではその寄与はほとんどなく、む
しろ声道パラメータの精度が重要となる。そのため、効
果的な圧縮符号化のためには、有声音の区間のための符
号化方法と、無声音に対する符号化方法を別々に用意す
ることが望ましい。A speech signal can be considered to be divided into a stationary voiced section and a transient unvoiced section, and both have significantly different statistical properties. Concerning the compression coding by CELP, the contribution to the sound quality by the adaptive codebook is very large for voiced sounds, but not so much for unvoiced sounds, and the accuracy of vocal tract parameters is rather important. Therefore, for effective compression encoding, it is desirable to separately prepare an encoding method for voiced sound sections and an encoding method for unvoiced sounds.

【０００５】従って、本発明は、有声音の場合は適応励
振コードに重点的に情報を配分し、無声音の場合は声道
パラメータに重点的に情報を配分することにより、高品
質・高能率とした音声の圧縮符号化方法を提供せんとす
るものである。Accordingly, the present invention distributes information mainly to the adaptive excitation code in the case of voiced sounds and distributes information mainly to the vocal tract parameters in the case of unvoiced sounds, thereby achieving high quality and high efficiency. It is intended to provide a method for compression-encoding the speech.

【０００６】[0006]

【課題を解決するための手段】本発明は次に示す事項で
特定されるＣＥＬＰの改良である。The present invention is an improvement on CELP specified by the following items.

【０００７】即ち、本発明は、入力音声信号（ベクト
ル）を線形予測分析して声道パラメータ（ベクトル）を
求める手段を持つ。また、入力音声信号の声道パラメー
タを量子化して、量子化された声道パラメータ（声道主
パラメータ）とそれに対応した声道パラメータコードを
出力する声道パラメータ量子化手段を持つ。また、入力
音声信号の前記声道パラメータと、補間後のもしくは補
間をともなわない量子化された前記声道パラメータとの
誤差を対象として量子化し、量子化された声道補正パラ
メータとそれに対応した声道パラメータ補正コードとを
出力する声道補正パラメータ量子化手段を持つ。また、
過去の入力音声信号の励振源パラメータを表わす適応励
振ベクトルを記憶している適応コードブックと、予め定
められている励振源パラメータである統計励振ベクトル
を記憶している統計コードブックとを持つ。更に、量子
化された声道パラメータと励振源パラメータとに基づい
て合成音声信号を作成し、前記入力音声信号と当該合成
音声信号（ベクトル）との誤差を評価することによっ
て、励振源出力コードを決定する。That is, the present invention has means for obtaining a vocal tract parameter (vector) by performing linear prediction analysis on an input speech signal (vector). Further, it has a vocal tract parameter quantization means for quantizing the vocal tract parameters of the input voice signal and outputting the quantized vocal tract parameters (vocal tract main parameters) and vocal tract parameter codes corresponding thereto. Further, the vocal tract parameter of the input voice signal is quantized with respect to an error between the vocal tract parameter after interpolation or the quantized vocal tract parameter without interpolation, and the quantized vocal tract correction parameter and the voice corresponding thereto are quantized. Vocal tract correction parameter quantization means for outputting a tract parameter correction code. Also,
It has an adaptive codebook that stores an adaptive excitation vector representing an excitation source parameter of a past input speech signal, and a statistical codebook that stores a statistical excitation vector that is a predetermined excitation source parameter. Further, a synthesized speech signal is created based on the quantized vocal tract parameters and excitation source parameters, and an error between the input speech signal and the synthesized speech signal (vector) is evaluated, whereby an excitation source output code is calculated. decide.

【０００８】そして、本発明は、入力音声信号の長周期
分析を行うことによって有声音か無声音かを判別し、有
声音の場合は、量子化された前記声道パラメータと、前
記適応励振ベクトルと前記統計励振ベクトルとの加算ベ
クトルとに基づいて、前記合成音声信号を作成し、且
つ、前記入力音声信号と当該合成音声信号との誤差を評
価することによって適応励振コードと統計励振コードと
を決定する。According to the present invention, a voiced sound or an unvoiced sound is discriminated by performing a long-period analysis of an input voice signal. In the case of a voiced sound, the quantized vocal tract parameters and the adaptive excitation vector are determined. The adaptive excitation code and the statistical excitation code are determined by creating the synthesized speech signal based on an addition vector of the statistical excitation vector and evaluating an error between the input speech signal and the synthesized speech signal. I do.

【０００９】無声音の場合は、量子化された前記声道パ
ラメータと量子化された前記声道補正パラメータとの加
算値と、前記統計励振ベクトルとに基づいて、前記合成
音声信号を作成し、前記入力音声信号と当該合成音声信
号との誤差を評価することによって統計励振コードとを
決定し、有声音か無声音かに応じて、それぞれ前記適応
励振コード及び前記声道パラメータ補正コードの一方を
他のコードと多重化して出力語とする。In the case of unvoiced sound, the synthesized speech signal is created based on the sum of the quantized vocal tract parameters and the quantized vocal tract correction parameters and the statistical excitation vector. A statistical excitation code is determined by evaluating an error between the input audio signal and the synthesized audio signal, and one of the adaptive excitation code and the vocal tract parameter correction code is replaced with another according to whether a voiced sound or an unvoiced sound is present. The output word is multiplexed with the code.

【００１０】また、本発明では、出力語として多重化す
る、声道パラメータコード、及び有声音か無声音かの判
別結果の有声無声識別コードはフレーム毎に更新される
情報とし、且つ声道パラメータ補正コード、適応励振コ
ード及び統計励振コードはサブフレーム毎に更新される
情報とすることができる。In the present invention, a vocal tract parameter code and a voiced / unvoiced identification code as a result of discrimination between a voiced voice and an unvoiced voice, which are multiplexed as output words, are information updated for each frame, and vocal tract parameter correction is performed. The code, the adaptive excitation code, and the statistical excitation code can be information updated for each subframe.

【００１１】[0011]

【作用】本発明の符号化方法では、まず音声の長周期相
関を分析して、有声音か無声音かを判別する。According to the coding method of the present invention, first, a long-period correlation of a voice is analyzed to determine whether it is a voiced sound or an unvoiced sound.

【００１２】有声音であった場合には、適応コードブッ
クを用いて長周期の相関を持つ励振信号ベクトルを符号
化する。In the case of a voiced sound, an excitation signal vector having a long-period correlation is encoded using an adaptive codebook.

【００１３】また、無声音であった場合には、適応コー
ドブックを用いず、その替わりに、声道パラメータの補
正を行うコードブックを用いて、声道パラメータの量子
化、補間などによる誤差を符号化する。In the case of unvoiced sound, an adaptive codebook is not used. Instead, a codebook for correcting vocal tract parameters is used to code errors due to quantization and interpolation of vocal tract parameters. Become

【００１４】また、有声無声識別コードはフレーム毎に
更新させてもサブフレーム毎に更新されてもよく、前者
の場合は、補間処理を後段に持ってこれるため、声道パ
ラメータの見通しが容易であり、そのコードブックの作
成に従来ものが利用できる利点がある。The voiced / unvoiced identification code may be updated for each frame or for each subframe. In the former case, the interpolation processing is brought to the subsequent stage, so that the vocal tract parameters can be easily viewed. There is an advantage that a conventional one can be used for creating the codebook.

【００１５】[0015]

【実施例】図１に本発明を適用した符号器のブロック図
を示す。FIG. 1 is a block diagram showing an encoder to which the present invention is applied.

【００１６】図１において、Ａ／Ｄ変換された入力音声
信号系列は、特定のフレーム長単位で入力され、声道分
析器１０１で入力音声信号は声道分析され、声道パラメ
ータを求める。入力音声信号の声道パラメータは声道パ
ラメータ量子化器１０２で内臓の量子化テーブルを用い
て量子化される。In FIG. 1, an A / D-converted input speech signal sequence is inputted in units of a specific frame length, and the vocal tract analyzer 101 analyzes the input speech signal to obtain vocal tract parameters. The vocal tract parameters of the input voice signal are quantized by a vocal tract parameter quantizer 102 using a built-in quantization table.

【００１７】量子化された声道パラメータＰ１に対応し
た声道パラメータコードＣ１は各フレームで１回、多重
化器１１３に送られる。The vocal tract parameter code C1 corresponding to the quantized vocal tract parameter P1 is sent to the multiplexer 113 once in each frame.

【００１８】また、量子化された声道パラメータＰ１は
補間器１０３で、フレームをさらに分割したサブフレー
ム単位に補間されて用いられる（声道補間パラメータＰ
ｈ）。The quantized vocal tract parameter P1 is used by the interpolator 103 after being interpolated in subframe units obtained by further dividing the frame (the vocal tract interpolation parameter P1).
h).

【００１９】現サブフレームで用いられる声道パラメー
タは量子化、および補間による誤差を含んでいる。そこ
で、サブフレーム毎に声道の分析をやりなおして、現サ
ブフレームで分析の結果得られた、誤差を含まない声道
パラメータとの差から、その誤差を減算器１１４で求
め、その誤差を声道補正パラメータ量子化器１０４で内
臓の量子化テーブルを用いて量子化し、声道補正パラメ
ータＰ２を求めておく。The vocal tract parameters used in the current subframe include errors due to quantization and interpolation. Therefore, the vocal tract analysis is performed again for each subframe, and the difference is obtained by the subtractor 114 from the difference from the vocal tract parameter containing no error obtained as a result of the analysis in the current subframe. The vocal tract correction parameter quantizer 104 quantizes the vocal tract using a built-in quantization table to obtain a vocal tract correction parameter P2.

【００２０】なお、無声音の場合は、声道補正パラメー
タＰ２に対応した声道パラメータ補正コードＣ２が、各
サブフレームに１回、多重化器１１３に送られる。In the case of an unvoiced sound, a vocal tract parameter correction code C2 corresponding to the vocal tract correction parameter P2 is sent to the multiplexer 113 once for each subframe.

【００２１】また、長周期分析器１０５は、入力音声信
号の長周期の相関を計算し、各サブフレーム毎に、現時
刻の入力音声信号の区間が有声音であるか、無声音であ
るかを判別し、その有声無声識別コードＣ３を多重化器
１１３に送る。The long-period analyzer 105 calculates the long-period correlation of the input speech signal, and determines whether the section of the input speech signal at the current time is a voiced sound or an unvoiced sound for each subframe. It discriminates and sends the voiced / unvoiced identification code C3 to the multiplexer 113.

【００２２】有声音であると判別した場合には、スイッ
チ１０６を閉じ、スイッチ１０７を開き、適応コードブ
ック１０８からの適応励振ベクトルと、統計コードブッ
ク１０９からの統計励振ベクトルを加算器１１５で加算
して励振ベクトルを構成し、その励振ベクトルから、補
正を受けない声道補間パラメータＰｈを用いた合成フィ
ルタ１１０で合成音声信号を合成し、その合成音声信号
と入力音声信号との誤差を、減算器１１６と誤差計算器
１１１とで計算する。If it is determined that the sound is a voiced sound, the switch 106 is closed, the switch 107 is opened, and the adaptive excitation vector from the adaptive codebook 108 and the statistical excitation vector from the statistical codebook 109 are added by the adder 115. To synthesize an synthesized speech signal from the excitation vector by the synthesis filter 110 using the vocal tract interpolation parameter Ph which is not corrected, and subtracts an error between the synthesized speech signal and the input speech signal. Is calculated by the calculator 116 and the error calculator 111.

【００２３】誤差計算器１１１で得られた誤差から、最
小誤差選択器１１２で最適な適応励振ベクトルと最適な
統計励振ベクトルを選択する。From the errors obtained by the error calculator 111, an optimal adaptive excitation vector and an optimal statistical excitation vector are selected by a minimum error selector 112.

【００２４】統計励振ベクトルに対応した統計励振コー
ドＣ４と適応励振ベクトルに対応した適応励振コードＣ
５とは、各サブフレーム毎に、多重化器１１３に送られ
る。The statistical excitation code C4 corresponding to the statistical excitation vector and the adaptive excitation code C corresponding to the adaptive excitation vector
5 is sent to the multiplexer 113 for each subframe.

【００２５】また、無声音と判別した場合には、スイッ
チ１０６を開き、スイッチ１０７を閉じて、統計コード
ブック１０９からの統計励振ベクトルのみで励振ベクト
ルを構成し、また、声道パラメータＰ１と声道補正パラ
メータＰ２とを加算器１１７で加算して補正された声道
パラメータを作成し、統計励振のみの励振ベクトルと補
正された声道パラメータを用いた合成フィルタ１１０で
合成音声信号を合成し、誤差計算器１１１で入力音声信
号との誤差を求める。When it is determined that the sound is unvoiced, the switch 106 is opened, the switch 107 is closed, and the excitation vector is constituted only by the statistical excitation vector from the statistical codebook 109. The vocal tract parameter P1 and the vocal tract parameter The correction parameter P2 is added by the adder 117 to generate a corrected vocal tract parameter, and the synthesized speech signal is synthesized by the synthesis filter 110 using the excitation vector of only the statistical excitation and the corrected vocal tract parameter, and the error The calculator 111 calculates an error from the input voice signal.

【００２６】無声音の場合には、最小誤差選択器１１２
は統計コードブック１０９についてのみ最適な励振ベク
トルを選択する。In the case of unvoiced sound, the minimum error selector 112
Selects the optimal excitation vector only for the statistical codebook 109.

【００２７】多重化器１１３は以上のように得られた、
声道パラメータコードＰ１と、有声無声識別コードＣ３
と、統計励振コードＣ５と、また、有声音の場合は、適
応励振コードＣ２とを、また、無声音の場合は、声道パ
ラメータ補正コードＣ２とを、多重化して通信回線に送
信する。The multiplexer 113 is obtained as described above.
Vocal tract parameter code P1 and voiced unvoiced identification code C3
, A statistical excitation code C5, an adaptive excitation code C2 for voiced sound, and a vocal tract parameter correction code C2 for unvoiced sound, and transmit them to the communication line.

【００２８】なお、この例では、声道パラメータコード
Ｃ１のみフレーム毎の情報であり、他のコードＣ２〜Ｃ
５はサブフレーム毎の情報である。In this example, only the vocal tract parameter code C1 is information for each frame, and the other codes C2 to C
5 is information for each subframe.

【００２９】図２に、図１の符号化器に対応した復号化
器のブロック図を示す。FIG. 2 shows a block diagram of a decoder corresponding to the encoder of FIG.

【００３０】図２において、多重分離器２０１は通信回
線から受け取った符号語を、声道パラメータＣ１、有声
無声識別コードＣ３、統計励振コードＣ５、および適応
励振コードＣ４、もしくは声道補正パラメータコードＣ
２に分離し、復号器の各部に送る。そのとき、もし有声
音であればスイッチ２０２を適応コードブック２０５に
つなぎ、無声音であれば声道補正パラメータ逆量子化器
２０４につなぐ。In FIG. 2, a demultiplexer 201 converts a codeword received from a communication line into a vocal tract parameter C1, a voiced unvoiced identification code C3, a statistical excitation code C5, an adaptive excitation code C4, or a vocal tract correction parameter code C4.
2 and sent to each part of the decoder. At this time, the switch 202 is connected to the adaptive codebook 205 if voiced, and to the vocal tract correction parameter inverse quantizer 204 if unvoiced.

【００３１】声道パラメータコードＣ１は声道パラメー
タ逆量子化器２０３で逆量子化され、声道パラメータＰ
１となる。さらに、声道パラメータＰ１は補間器２０７
で各サブフレーム単位に補間される。The vocal tract parameter code C1 is inversely quantized by the vocal tract parameter inverse quantizer 203, and the vocal tract parameter P
It becomes 1. Further, the vocal tract parameter P1 is
Is interpolated for each subframe.

【００３２】有声無声識別コードＣ３が有声音を示して
いる場合には、スイッチ２１０を閉じ、スイッチ２０９
を開いて、適応コードブック２０５からの、適応励振コ
ードＣ４に対応する最適な適応励振ベクトルと、統計コ
ードブック２０６からの、統計励振コードＣ５に対応す
る最適な統計励振ベクトルを加算器２１１で足し合わせ
て、励振ベクトルを構成し、補正を受けない声道補間パ
ラメータＰｈを用いた合成フィルタ２０８で再生音声出
力を合成する。When the voiced / unvoiced identification code C3 indicates a voiced sound, the switch 210 is closed and the switch 209 is closed.
And add the optimal adaptive excitation vector corresponding to the adaptive excitation code C4 from the adaptive codebook 205 and the optimal statistical excitation vector corresponding to the statistical excitation code C5 from the statistical codebook 206 by the adder 211. In addition, an excitation vector is formed, and the reproduced voice output is synthesized by the synthesis filter 208 using the vocal tract interpolation parameter Ph that is not corrected.

【００３３】逆に、有声無声識別コードＣ３が無声音を
示している場合には、スイッチ２１０を聞き、スイッチ
２０９を閉じて、声道補正パラメータ逆量子化器２０４
で、声道補正パラメータコードＣ２を逆量子化して声道
補正パラメータＰ２を求め、声道パラメータＰｈを加算
器２１２で補正する。そして、統計コードブック２０６
からの、統計励振コードＣ５に対応する最適な統計励振
ベクトルのみから励振ベクトルを構成し、補正された声
道パラメータを用いた合成フィルタ２０８で再生音声出
力を合成する。Conversely, when the voiced unvoiced identification code C3 indicates an unvoiced sound, the switch 210 is heard, the switch 209 is closed, and the vocal tract correction parameter inverse quantizer 204
Then, the vocal tract correction parameter code C2 is inversely quantized to obtain a vocal tract correction parameter P2, and the vocal tract parameter Ph is corrected by the adder 212. And the statistical code book 206
, An excitation vector is formed only from the optimal statistical excitation vector corresponding to the statistical excitation code C5, and the reproduced sound output is synthesized by the synthesis filter 208 using the corrected vocal tract parameters.

【００３４】図３は、本発明を適用した他の符号化器の
ブロック図を示す。図１の例と異なるのは、有声無声識
別をフレーム単位で行い、サブフレーム単位で行わない
ことである。FIG. 3 is a block diagram showing another encoder to which the present invention is applied. The difference from the example of FIG. 1 is that voiced / unvoiced identification is performed in units of frames and not in units of subframes.

【００３５】図３において、Ａ／Ｄ変換された入力音声
信号系列は、特定のフレーム長単位で入力される。In FIG. 3, the input audio signal sequence that has been subjected to A / D conversion is input in units of a specific frame length.

【００３６】まず、声道分析器１０１で入力音声信号は
声道分析され、声道パラメータを求める。長周期分析器
１０５は、入力音声信号の長周期の相関を計算し、現時
刻の入力音声信号の区間が有声音であるか、無声音であ
るかを判別する。その判別（Ｃ３）はフレーム単位で行
う。First, the input voice signal is analyzed by the vocal tract analyzer 101 to obtain vocal tract parameters. The long-period analyzer 105 calculates a long-period correlation of the input voice signal, and determines whether the section of the input voice signal at the current time is a voiced sound or an unvoiced sound. The determination (C3) is performed in frame units.

【００３７】入力音声信号の声道パラメータは声道パラ
メータ量子化器１０２で量子化される。量子化された声
道パラメータはＰ１は各フレームで１回、多重化器１１
３に送られる。The vocal tract parameters of the input speech signal are quantized by a vocal tract parameter quantizer 102. The quantized vocal tract parameters are P1 once in each frame,
Sent to 3.

【００３８】有声音無声音判別の結果Ｃ３が無声音であ
る場合には、量子化される前の声道パラメータと量子化
された後の声道パラメータＰ１との差から、その誤差を
減算器３１４で求め、その誤差を声道補正パラメータ量
子化器３０４で量子化し、サブフレーム毎に声道補正パ
ラメータＰ２を求めておく。If the result of the voiced / unvoiced sound discrimination C3 is unvoiced, the difference between the vocal tract parameter before quantization and the vocal tract parameter P1 after quantization is subtracted by a subtractor 314 from the difference between the vocal tract parameter before quantization and the vocal tract parameter P1 after quantization. The error is quantized by a vocal tract correction parameter quantizer 304, and a vocal tract correction parameter P2 is obtained for each subframe.

【００３９】有声音の場合には、スイッチ３０７を開
き、量子化された声道パラメータを補間器３０３で、フ
レームをさらに分割したサブフレーム単位に補間して用
い、無声音の場合には、スイッチ３０７を閉じ、量子化
された声道パラメータＰ１と声道補正パラメータＰｈを
足し合わせたものを補間器３０３で補間して用いる。In the case of voiced sound, the switch 307 is opened, and the quantized vocal tract parameters are interpolated by the interpolator 303 in subframe units obtained by further dividing the frame, and in the case of unvoiced sound, the switch 307 is opened. Is closed, and the sum of the quantized vocal tract parameter P1 and the vocal tract correction parameter Ph is used by interpolation by the interpolator 303.

【００４０】励振源の符号化については、有声音の場合
には、スイッチ１０６を閉じ、適応コードブック１０８
からの適応励振ベクトルと、統計コードブック１０９か
らの統計励振ベクトルを加算器１１５で加算して励振ベ
クトルを構成し、合成フィルタ１１０で合成音声信号を
合成し、その合成音声信号と入力音声信号との誤差を、
減算器１１６と誤差計算器１１１とで計算する。誤差計
算器１１１で得られた誤差から、最小誤差選択器１１２
で最適な適応励振ベクトルと最適な統計励振ベクトルを
選択する。Regarding the encoding of the excitation source, in the case of voiced sound, the switch 106 is closed and the adaptive code book 108
, And the statistical excitation vector from the statistical codebook 109 are added by an adder 115 to form an excitation vector, a synthesis filter 110 synthesizes a synthesized speech signal, and the synthesized speech signal and the input speech signal are combined. Error of
The calculation is performed by the subtractor 116 and the error calculator 111. From the error obtained by the error calculator 111, a minimum error selector 112
Select the optimal adaptive excitation vector and the optimal statistical excitation vector with.

【００４１】また、無声音と判別した場合には、スイッ
チ１０６を開き、統計コードブック１０９からの統計励
振ベクトルのみで励振ベクトルを構成し、合成フィルタ
１１０で合成音声信号を合成し、減算器１１６と誤差計
算器１１１で入力音声信号との誤差を求める。無声音の
場合には、最小誤差選択器１１２は統計コードブック１
０９についてのみ最適な励振ベクトルを選択する。If it is determined that the sound is unvoiced, the switch 106 is opened, an excitation vector is composed only of the statistical excitation vector from the statistical codebook 109, the synthesized filter 110 synthesizes the synthesized speech signal, and the subtracter 116 An error calculator 111 calculates an error from the input voice signal. In the case of unvoiced sound, the minimum error selector 112 selects the statistical codebook 1
An optimal excitation vector is selected only for 09.

【００４２】多重化器１１３は以上の装置で得られた、
声道パラメータコードＣ１、有声無声識別コードＣ３、
統計励振コードＣ５、並びに、適応励振コードＣ４もし
くは声道補正パラメータコードＣ２の一方を、多重化
し、通信回線に送信する。The multiplexer 113 is obtained by the above-mentioned device.
Vocal tract parameter code C1, voiced unvoiced identification code C3,
The statistical excitation code C5 and one of the adaptive excitation code C4 and the vocal tract correction parameter code C2 are multiplexed and transmitted to the communication line.

【００４３】コードＣ１、Ｃ３はフレーム単位に情報で
あり、コードＣ２、Ｃ４、Ｃ５はフレーム単位に情報で
ある。The codes C1 and C3 are information for each frame, and the codes C2, C4 and C5 are information for each frame.

【００４４】図４に、図１の符号化器に対応した復号化
器のブロック図を示す。FIG. 4 is a block diagram of a decoder corresponding to the encoder of FIG.

【００４５】図４において、多重分離器２０１は通信回
線から受け取った符号語を、声道パラメータコードＣ
１、有声無声識別コードＣ３、統計励振コードＣ５、お
よび適応励振コードＣ４、もしくは声道補正パラメータ
コードＣ２に分離し、復号器の各装置に送る。そのと
き、もし有声音であればスイッチ２０２を適応コードブ
ック２０５につなぎ、無声音であれば声道補正パラメー
タ逆量子化器２０４につなぐ。In FIG. 4, a demultiplexer 201 converts a codeword received from a communication line into a vocal tract parameter code C
1. It is separated into a voiced / unvoiced identification code C3, a statistical excitation code C5, an adaptive excitation code C4, or a vocal tract correction parameter code C2 and sent to each device of the decoder. At this time, the switch 202 is connected to the adaptive codebook 205 if voiced, and to the vocal tract correction parameter inverse quantizer 204 if unvoiced.

【００４６】声道パラメータコードＣ１は声道パラメー
タ逆量子化器４０３で逆量子化され、声道パラメータと
なる。The vocal tract parameter code C1 is inversely quantized by the vocal tract parameter inverse quantizer 403 to become a vocal tract parameter.

【００４７】有声無声識別コードＣ３が無声音を示して
いる場合には、スイッチ２１０を開き、スイッチ４０９
を閉じて、声道補正パラメータ逆量子化器２０４で、声
道パラメータ補正コードＣ２を逆量子化して声道補正パ
ラメータＰ２を求め、声道パラメータを補正する。さら
に、補間器４０７で各サブフレーム単位に補間される
（Ｐｈ）。If the voiced unvoiced identification code C3 indicates a voiceless sound, the switch 210 is opened and the switch 409 is opened.
Is closed, the vocal tract correction parameter inverse quantizer 204 inversely quantizes the vocal tract parameter correction code C2 to obtain the vocal tract correction parameter P2, and corrects the vocal tract parameter. Further, the data is interpolated by the interpolator 407 in each sub-frame unit (Ph).

【００４８】そして、統計コードブック２０６からの、
統計励振ベクトルコードに対応する最適な統計励振ベク
トルのみから励振信号ベクトルを構成し、補正された声
道パラメータを用いた合成フィルタ４０８で再生音声出
力を合成する。Then, from the statistical code book 206,
An excitation signal vector is formed only from the optimal statistical excitation vector corresponding to the statistical excitation vector code, and the reproduced sound output is synthesized by the synthesis filter 408 using the corrected vocal tract parameters.

【００４９】逆に、有声無声識別コードＣ３が有声音を
示している場合には、スイッチ２１０を閉じ、スイッチ
４０９を開いて、適応コードブック２０５からの、適応
励振コードＣ４に対応する最適な適応励振ベクトルと、
統計コードブック２０６からの、統計励振コードＣ５に
対応する最適な統計励振ベクトルから励振ベクトルを構
成し、補正を受けない声道補間パラメータＰｈを用いた
合成フィルタ２０８で再生音声出力を合成する。Conversely, when the voiced unvoiced identification code C3 indicates a voiced sound, the switch 210 is closed and the switch 409 is opened, and the optimum adaptation corresponding to the adaptive excitation code C4 from the adaptive codebook 205 is performed. The excitation vector,
An excitation vector is constructed from the optimal statistical excitation vector corresponding to the statistical excitation code C5 from the statistical codebook 206, and the reproduced voice output is synthesized by the synthesis filter 208 using the vocal tract interpolation parameter Ph which is not corrected.

【００５０】[0050]

【発明の効果】本発明により、有声音を効果的に符号化
するような符号化方法と、無声音に対して効果的に符号
化するような符号化方法を、選択的に用いることを可能
とすることで、より高品質、高能率な音声の圧縮符号化
方法を実現できる。According to the present invention, it is possible to selectively use a coding method for effectively coding voiced sound and a coding method for effectively coding unvoiced sound. By doing so, a higher quality and more efficient voice compression encoding method can be realized.

[Brief description of the drawings]

【図１】本発明を適用した符号化器のブロック図FIG. 1 is a block diagram of an encoder to which the present invention is applied.

【図２】図１に対応した復号化器のブロック図FIG. 2 is a block diagram of a decoder corresponding to FIG.

【図３】本発明を適用した他の符号化器のブロック図FIG. 3 is a block diagram of another encoder to which the present invention is applied;

【図４】図３に対応した復号化器のブロック図FIG. 4 is a block diagram of a decoder corresponding to FIG. 3;

[Explanation of symbols]

１０１声道分析器１０２声道パラメータ量子化器１０３補間器１０４声道補正パラメータ量子化器１０５長周期分析器１０６スイッチ１０７スイッチ１０８適応コードブック１０９適応コードブック１１０合成フィルタ１１１誤差計算器１１２最小誤差選択器１１３多重化回路１１４減算器１１５加算器１１６減算器 Reference Signs List 101 vocal tract analyzer 102 vocal tract parameter quantizer 103 interpolator 104 vocal tract correction parameter quantizer 105 long period analyzer 106 switch 107 switch 108 adaptive codebook 109 adaptive codebook 110 synthesis filter 111 error calculator 112 minimum error Selector 113 Multiplexer 114 Subtractor 115 Adder 116 Subtractor

───────────────────────────────────────────────────── フロントページの続き (72)発明者有山義博東京都港区虎ノ門１丁目７番12号沖電気工業株式会社内 (56)参考文献特開平１−54497（ＪＰ，Ａ) 特開昭59−172690（ＪＰ，Ａ) (58)調査した分野(Int.Cl.⁷，ＤＢ名) G10L 11/00 - 21/06 H03M 7/30 ＪＩＣＳＴファイル（ＪＯＩＳ)──────────────────────────────────────────────────続き Continuation of the front page (72) Inventor Yoshihiro Ariyama 1-7-12 Toranomon, Minato-ku, Tokyo Oki Electric Industry Co., Ltd. (56) References JP-A-1-54497 (JP, A) Kaisho 59-172690 (JP, A) (58) Fields investigated (Int. Cl. ⁷ , DB name) G10L 11/00-21/06 H03M 7/30 JICST file (JOIS)

Claims

(57) [Claims]

1. A means for linearly predicting and analyzing an input speech signal to obtain a vocal tract parameter, a voice for quantizing the vocal tract parameter, and outputting a quantized vocal tract parameter and a vocal tract parameter code corresponding thereto. Vocal tract parameter quantizing means, and a quantized vocal tract correction parameter which is quantized with respect to an error between the vocal tract parameter of the input voice signal and the quantized vocal tract parameter after interpolation or without interpolation. And a vocal tract correction parameter quantizing means for outputting a vocal tract parameter correction code corresponding thereto and an adaptive codebook storing an adaptive excitation vector representing an excitation source parameter of the past input voice signal. A statistical codebook that stores a statistical excitation vector, which is an excitation source parameter. And generating a synthesized speech signal based on the excitation source parameter and evaluating an error between the input speech signal and the synthesized speech signal to determine an excitation source output code. Determine whether voiced or unvoiced by performing a long-period analysis, in the case of voiced sound, the quantized vocal tract parameters,
Based on the sum of the adaptive excitation vector and the statistical excitation vector, the synthesized speech signal is created, and the error between the input speech signal and the synthesized speech signal is evaluated, whereby the adaptive excitation code and the statistical An excitation code is determined, and in the case of unvoiced sound, the synthesized speech signal is calculated based on an added value of the quantized vocal tract parameter and the quantized vocal tract correction parameter, and the statistical excitation vector. A statistical excitation code is determined by evaluating an error between the input audio signal and the synthesized audio signal, and the adaptive excitation code and the vocal tract parameter correction code are respectively determined according to voiced sound or unvoiced sound. A speech encoding method characterized in that one is multiplexed with another code to produce an output word.

2. The voice coding method according to claim 1, wherein the vocal tract parameter code and the voiced / unvoiced identification code as a result of determining whether the voiced sound or unvoiced sound are multiplexed as output words are information updated for each frame. A speech coding method characterized in that there are vocal tract parameter correction codes, adaptive excitation codes, and statistical excitation codes, which are information updated for each subframe.