JP4034929B2

JP4034929B2 - Speech encoding device

Info

Publication number: JP4034929B2
Application number: JP2000252349A
Authority: JP
Inventors: 利幸森井; 和敏安永
Original assignee: Panasonic Corp; Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Corp; Panasonic Holdings Corp
Priority date: 1999-08-23
Filing date: 2000-08-23
Publication date: 2008-01-16
Anticipated expiration: 2020-08-23
Also published as: JP2001142500A

Description

【０００１】
【発明の属する技術分野】
本発明は、ディジタル通信システムにおいて使用される音声符号化装置に関する。
【０００２】
【従来の技術】
携帯電話などのディジタル移動通信の分野では、加入者の増加に対処するために低ビットレートの音声の圧縮符号化法が求められており、各研究機関において研究開発が進んでいる。
【０００３】
日本国内においては、モトローラ社が開発したビットレート１１．２ｋｂｐｓのＶＳＥＬＰという符号化法がディジタル携帯電話用の標準符号化方式として採用され、同方式を搭載したディジタル携帯電話は１９９４年秋から国内において発売されている。
【０００４】
また、ＮＴＴ移動通信網株式会社の開発したビットレート５．６ｋｂｐｓのＰＳＩ−ＣＥＬＰという符号化方式が現在製品化されている。これらの方式はいずれもＣＥＬＰ(Code Exited Linear Prediction: M.R.Schroeder”High Quality Speech at Low Bit Rates” Proc.ICASSP'85 pp.937-940に記載されている)という方式を改良したものである。
【０００５】
このＣＥＬＰ方式は、音声を音源情報と声道情報とに分離し、音源情報については符号帳に格納された複数の音源サンプルのインデクスによって符号化し、声道情報についてはＬＰＣ（線形予測係数）を符号化するということ及び音源情報符号化の際に声道情報を加味して入力音声とを比較することを行う方法(Ａ−ｂ−Ｓ:Analysis by Synthesis)を採用していることに特徴がある。
【０００６】
このＣＥＬＰ方式においては、まず、入力された音声データ（入力音声）に対して自己相関分析とＬＰＣ分析を行ってＬＰＣ係数を得て、得られたＬＰＣ係数の符号化を行ってＬＰＣ符号を得る。さらに、得られたＬＰＣ符号を復号化して復号化ＬＰＣ係数を得る。一方、入力音声は、ＬＰＣ係数を用いた聴感重み付けフィルタを用いて聴感重み付けされる。
【０００７】
適応符号帳と確率的符号帳に格納された音源サンプル（それぞれ適応コードベクトル（又は適応音源）、確率的コードベクトル（又は、確率的音源）と呼ぶ）のそれぞれのコードベクトルに対して、得られた復号化ＬＰＣ係数によってフィルタリングを行い、２つの合成音を得る。
【０００８】
そして、得られた２つの合成音と、聴感重み付けされた入力音声との関係を分析し、２つの合成音の最適値（最適ゲイン）を求め、求められた最適ゲインによって合成音をパワー調整し、それぞれの合成音を加算して総合合成音を得る。その後、得られた総合合成音と入力音声との間の符号化歪みを求める。このようにして、全ての音源サンプルに対して総合合成音と入力音声との間の符号化歪みを求め、符号化歪みが最も小さいときの音源サンプルのインデクスを求める。
【０００９】
このようにして得られたゲイン及び音源サンプルのインデクスを符号化し、これらの符号化されたゲイン及び音源サンプルをＬＰＣ符号と共に伝送路に送る。また、ゲイン符号と音源サンプルのインデクスに対応する２つの音源から実際の音源信号を作成し、それを適応符号帳に格納すると同時に古い音源サンプルを破棄する。
【００１０】
なお、一般的には、適応符号帳と確率的符号帳に対する音源探索は、分析区間をさらに細かく分けた区間（サブフレームと呼ばれる）で行われる。
【００１１】
ゲインの符号化（ゲイン量子化）は、音源サンプルのインデクスに対応する２つの合成音を用いてゲインの量子化歪を評価するベクトル量子化（ＶＱ）によって行われる。
【００１２】
このアルゴリズムにおいては、予めパラメータベクトルの代表的サンプル（コードベクトル）が複数格納されたベクトル符号帳を作成しておく。次いで、聴感重み付けした入力音声と、適応音源及び確率的音源を聴感重み付けＬＰＣ合成したものとに対して、ベクトル符号帳に格納されたゲインコードベクトルを用いて符号化歪を下記式１により計算する。
【００１３】
【数１】

ここで、
Ｅ_n：ｎ番のゲインコードベクトルを用いたときの符号化歪み
Ｘ_i：聴感重み付け音声
Ａ_i：聴感重み付けＬＰＣ合成済み適応音源
Ｓ_i：聴感重み付けＬＰＣ合成済み確率的音源
ｇ_n：コードベクトルの要素（適応音源側のゲイン）
ｈ_n：コードベクトルの要素（確率的音源側のゲイン）
ｎ：コードベクトルの番号
ｉ：音源データのインデクス
Ｉ：サブフレーム長（入力音声の符号化単位）
【００１４】
次いで、ベクトル符号帳を制御することによって各コードベクトルを用いたときの歪Ｅ_nを比較し、最も歪の小さいコードベクトルの番号をベクトルの符号とする。また、ベクトル符号帳に格納された全てのコードベクトルの中で最も歪みが小さくなるコードベクトルの番号を求め、これをベクトルの符号とする。
【００１５】
上記式１は一見して各ｎ毎に多くの計算を必要とするように見えるが、予めｉについての積和を計算しておけばよいので、少ない計算量でｎの探索を行うことができる。
【００１６】
一方、音声復号化装置（デコーダ）では、伝送されてきたベクトルの符号に基づいてコードベクトルを求めることによって符号化されたデータを復号化してコードベクトルを得る。
【００１７】
また、上記アルゴリズムを基本として、従来よりさらなる改良がなされてきた。例えば、人間の音圧の聴覚特性が対数であることを利用し、パワを対数化して量子化し、そのパワで正規化した２つのゲインをＶＱする。この方法は、日本国ＰＤＣハーフレートコーデックの標準方式で用いられている方法である。また、ゲインパラメータのフレーム間相関を利用して符号化する方法（予測符号化）がある。この方法は、ＩＴＵ−Ｔ国際標準Ｇ．７２９で用いられている方法である。しかしながら、これらの改良によっても十分な性能を得ることができていない。
【００１８】
【発明が解決しようとする課題】
これまで人間の聴覚特性やフレーム間相関を利用したゲイン情報符号化法が開発され、ある程度効率の良いゲイン情報の符号化が可能になった。特に、予測量子化によって性能は大きく向上したが、その従来法では、状態としての値として以前のサブフレームの値をそのまま用いて予測量子化を行っていた。しかしながら、状態として格納される値の中には、極端に大きな（小さな）値をとるものがあり、その値を次のサブフレームに用いると、次のサブフレームの量子化がうまくいかず、局所的異音になる場合がある。
【００１９】
本発明はかかる点に鑑みてなされたものであり、予測量子化を用いて局所的異音を生じることなく音声符号化を行うことができるＣＥＬＰ型音声符号化装置を提供することを目的とする。
【００２０】
【課題を解決するための手段】
本発明の骨子は、予測量子化において前のサブフレームでの状態値が極端に大きな値や極端に小さな値である場合に、自動的に予測係数を調整することにより、局所的異音の発生を防止することである。
【００２１】
【発明の実施の形態】
本発明の音声符号化装置は、適応符号帳及び確率的符号帳に格納された適応音源及び確率的音源に対して、入力音声から求めたＬＰＣ係数を用いてフィルタリングすることにより、合成音を得るＬＰＣ合成手段と、前記適応音源及び前記確率的音源のゲインを求め、さらに前記ゲインを用いて得られる前記入力音声と前記合成音との間の符号化歪みを用いて適応音源及び確率的音源の符号を探索するゲイン演算手段と、求められた符号に対応する適応音源及び確率的音源を用いてゲインの予測符号化を行うパラメータ符号化手段と、を具備し、前記パラメータ符号化手段は、以前のサブフレームの状態に応じて前記予測符号化に用いる予測係数を自動的に調整する予測係数調整手段を備える構成を採る。
【００２２】
この構成によれば、各コードベクトルに応じて予測係数を制御することが可能になり、音声の局所的特徴により適応したより効率的な予測や、非定常部における予測の弊害を防ぐことができる。
【００２３】
本発明の音声符号化装置は、適応符号帳及び確率的符号帳に格納された適応音源及び確率的音源に対して、入力音声から求めたＬＰＣ係数を用いてフィルタリングすることにより、合成音を得るＬＰＣ合成手段と、前記適応音源及び前記確率的音源のゲインを求めるゲイン演算手段と、前記入力音声と前記合成音との間の符号化歪みを用いて求められた適応音源及び確率的音源、並びに前記ゲインのベクトル量子化を行うパラメータ符号化手段と、を具備し、１つのフレームを複数のサブフレームに分解して符号化を行うＣＥＬＰ型音声符号化装置であって、最初のサブフレームの適応符号帳探索の前に、フレームを構成する複数のサブフレームのピッチ分析を行って相関値を求め、前記相関値を用いて最もピッチ周期に近似する値を算出するピッチ分析手段を備える構成を採る。
【００２４】
この構成によれば、第２サブフレームの探索の時に第２サブフレームの仮ピッチ付近を探索できるので、フレームの後半から音声が始まる場合などの非定常なフレームでも、第１，第２サブフレームにおいて適当なラグ探索が可能になる。
【００２５】
以下、本発明の実施の形態について、添付図面を参照して詳細に説明する。
（実施の形態１）
図１は、本発明の実施の形態１，２に係る音声符号化装置を備えた無線通信装置の構成を示すブロック図である。
【００２６】
この無線通信装置において、送信側で音声がマイクなどの音声入力装置１１によって電気的アナログ信号に変換され、Ａ／Ｄ変換器１２に出力される。アナログ音声信号は、Ａ／Ｄ変換器１２によってディジタル音声信号に変換され、音声符号化部１３に出力される。音声符号化部１３は、ディジタル音声信号に対して音声符号化処理を行い、符号化した情報を変復調部１４に出力する。変復調部１４は、符号化された音声信号をディジタル変調して、無線送信部１５に送る。無線送信部１５では、変調後の信号に所定の無線送信処理を施す。この信号は、アンテナ１６を介して送信される。なお、プロセッサ２１は、適宜ＲＡＭ２２及びＲＯＭ２３に格納されたデータを用いて処理を行う。
【００２７】
一方、無線通信装置の受信側では、アンテナ１６で受信した受信信号は、無線受信部１７で所定の無線受信処理が施され、変復調部１４に送られる。変復調部１４では、受信信号に対して復調処理を行い、復調後の信号を音声復号化部１８に出力する。音声復号化部１８は、復調後の信号に復号処理を行ってディジタル復号音声信号を得て、そのディジタル復号音声信号をＤ／Ａ変換器１９へ出力する。Ｄ／Ａ変換器１９は、音声復号化部１８から出力されたディジタル復号音声信号をアナログ復号音声信号に変換してスピーカなどの音声出力装置２０に出力する。最後に音声出力装置２０が電気的アナログ復号音声信号を復号音声に変換して出力する。
【００２８】
ここで、音声符号化部１３及び音声復号化部１８は、ＲＡＭ２２及びＲＯＭ２３に格納された符号帳を用いてＤＳＰなどのプロセッサ２１により動作する。また、これらの動作プログラムは、ＲＯＭ２３に格納されている。
【００２９】
図２は、本発明の実施の形態１に係るＣＥＬＰ型音声符号化装置の構成を示すブロック図である。この音声符号化装置は、図１に示す音声符号化部１３に含まれている。なお、図２に示す適応符号帳１０３は図１に示すＲＡＭ２２に格納されており、図２に示す確率的符号帳１０４は図１に示すＲＯＭ２３に格納されている。
【００３０】
図２に示す音声符号化装置においては、ＬＰＣ分析部１０２において、入力された音声データ（入力音声）１０１に対して自己相関分析及びＬＰＣ分析を行ってＬＰＣ係数を得る。また、ＬＰＣ分析部１０２では、得られたＬＰＣ係数の符号化を行ってＬＰＣ符号を得る。さらに、ＬＰＣ分析部１０２では、得られたＬＰＣ符号を復号化して復号化ＬＰＣ係数を得る。入力された音声データ１０１は、聴感重み付け部１０７に送られ、そこで上記ＬＰＣ係数を用いた聴感重み付けフィルタを用いて聴感重み付けされる。
【００３１】
次に、音源作成部１０５において、適応符号帳１０３に格納された音源サンプル（適応コードベクトル又は適応音源）と確率的符号帳１０４に格納された音源サンプル（確率的コードベクトル又は、確率的音源）を取り出し、それぞれのコードベクトルを聴感重みＬＰＣ合成部１０６へ送る。さらに、聴感重みＬＰＣ合成部１０６において、音源作成部１０５で得られた２つの音源に対して、ＬＰＣ分析部１０２で得られた復号化ＬＰＣ係数によってフィルタリングを行い、２つの合成音を得る。
【００３２】
なお、聴感重みＬＰＣ合成部１０６においては、ＬＰＣ係数や高域強調フィルタや長期予測係数（入力音声の長期予測分析を行うことによって得られる）を用いた聴感重み付けフィルターを併用してそれぞれの合成音に対して聴感重み付けＬＰＣ合成を行う。
【００３３】
聴感重みＬＰＣ合成部１０６は、２つの合成音をゲイン演算部１０８に出力する。ゲイン演算部１０８は、図３に示す構成を有する。ゲイン演算部１０８においては、聴感重みＬＰＣ合成部１０６で得られた２つの合成音及びを聴感重み付けされた入力音声を分析部１０８１に送り、そこで２つの合成音と入力音声との関係を分析し、２つの合成音の最適値（最適ゲイン）を求める。この最適ゲインは、パワ調整部１０８２に出力される。
【００３４】
パワ調整部１０８２では、求められた最適ゲインによって２つの合成音をパワ調整する。パワ調整された合成音は、合成部１０８３に出力されて、そこで加算されて総合合成音となる。この総合合成音は、符号化歪算出部１０８４に出力される。符号化歪算出部１０８４では、得られた総合合成音と入力音声との間の符号化歪みを求める。
【００３５】
符号化歪算出部１０８４は、音源作成部１０５を制御して、適応符号帳１０３及び確率的符号帳１０４の全ての音源サンプルを出力させ、全ての音源サンプルに対して総合合成音と入力音声との間の符号化歪みを求め、符号化歪みが最も小さいときの音源サンプルのインデクスを求める。
【００３６】
次に、分析部１０８１は、音源サンプルのインデクス、そのインデクスに対応する２つの聴感重み付けＬＰＣ合成された音源、及び入力音声をパラメータ符号化部１０９に送る。
【００３７】
パラメータ符号化部１０９では、ゲインの符号化を行うことによってゲイン符号を得、ＬＰＣ符号、音源サンプルのインデクスをまとめて伝送路へ送る。また、ゲイン符号とインデクスに対応する２つの音源から実際の音源信号を作成し、それを適応符号帳１０３に格納すると同時に古い音源サンプルを破棄する。なお、一般的には、適応符号帳と確率的符号帳に対する音源探索は、分析区間をさらに細かく分けた区間（サブフレームと呼ばれる）で行われる。
【００３８】
ここで、上記構成を有する音声符号化装置のパラメータ符号化部１０９のゲイン符号化の動作について説明する。図４は、本発明の音声符号化装置のパラメータ符号化部の構成を示すブロック図である。
【００３９】
図４において、聴感重み付け入力音声（Ｘ_i）、聴感重み付けＬＰＣ合成済み適応音源（Ａ_i）、及び聴感重み付けＬＰＣ合成済み確率的音源（Ｓ_i）がパラメータ計算部１０９１に送られる。パラメータ計算部１０９１では、符号化歪計算に必要なパラメータを計算する。パラメータ計算部１０９１で計算されたパラメータは、符号化歪計算部１０９２に出力され、そこで符号化歪が計算される。この符号化歪は、比較部１０９３に出力される。比較部１０９３では、符号化歪計算部１０９２及びベクトル符号帳１０９４を制御して、得られた符号化歪から最も適当とされる符号（復号化ベクトル）を求め、この符号を基にベクトル符号帳１０９４から得られるコードベクトルを復号化ベクトル格納部１０９６に出力し、復号化ベクトル格納部１０９６を更新する。
【００４０】
予測係数格納部１０９５は、予測符号化に用いる予測係数を格納する。この予測係数はパラメータ計算及び符号化歪計算に用いられるために、パラメータ計算部１０９１及び符号化歪計算部１０９２に出力される。復号化ベクトル格納部１０９６は、予測符号化のために状態を格納する。この状態は、パラメータ計算に用いられるため、パラメータ計算部１０９１に出力される。ベクトル符号帳１０９４は、コードベクトルを格納する。
【００４１】
次に、本発明に係るゲイン符号化方法のアルゴリズムについて説明する。
予め、量子化対象ベクトルの代表的サンプル（コードベクトル）が複数格納されたベクトル符号帳１０９４を作成しておく。各ベクトルは、ＡＣゲイン、ＳＣゲインの対数値に対応する値、及びＳＣの予測係数の調整係数の３つの要素からなる。
【００４２】
この調整係数は、以前のサブフレームの状態に応じて予測係数を調整する係数である。具体的には、この調整係数は、以前のサブフレームの状態が極端に大きな値又は極端に小さな値である場合に、その影響を小さくするように設定される。この調整係数は、多数のベクトルサンプルを用いた本発明者らが開発した学習アルゴリズムにより求めることが可能である。ここでは、この学習アルゴリズムについての説明は省略する。
【００４３】
例えば、有声音に多くの頻度で用いるコードベクトルは調整係数を大きく設定する。すなわち、同じ波形が並んでいる場合には、以前のサブフレームの状態の信頼性が高いので調整係数を大きくして、以前のサブフレームの予測係数をそのまま利用できるようにする。これにより、より効率的な予測を行うことができる。
【００４４】
一方、語頭などに使用するあまり使用頻度の少ないコードベクトルは調整係数を小さくする。すなわち、前の波形と全然違う場合には、以前のサブフレームの状態の信頼性が低い（適応符号帳が機能しないと考えられる）ので、調整係数を小さくして、以前のサブフレームの予測係数の影響を小さくする。これにより、次の予測の弊害を防いで良好な予測符号化を実現することができる。
【００４５】
このように、各コードベクトル（状態）に応じて予測係数を制御することにより、これまでの予測符号化の性能をさらに向上させることができる。
【００４６】
また、予測係数格納部１０９５には、予測符号化を行うための予測係数を格納しておく。この予測係数はＭＡ(moving average)の予測係数でＡＣとＳＣの２種類を予測次数分格納する。これらの予測係数値は、一般に、予め多くのデータを用いた学習により求めておく。また、復号化ベクトル格納部１０９６には、初期値として無音状態を示す値を格納しておく。
【００４７】
次に、符号化方法について詳細に説明する。まず、パラメータ計算部１０９１に聴感重み付け入力音声（Ｘ_i）、聴感重み付けＬＰＣ合成済み適応音源（Ａ_i）、聴感重み付けＬＰＣ合成済み確率的音源（Ｓ_i）を送り、さらに復号化ベクトル格納部１０９６に格納された復号化ベクトル（ＡＣ、ＳＣ、調整係数）、予測係数格納部１０９５に格納された予測係数（ＡＣ、ＳＣ）を送る。これらを用いて符号化歪計算に必要なパラメータを計算する。
【００４８】
符号化歪計算部１０９２における符号化歪計算は、下記式２にしたがって行う。
【００４９】
【数２】

ここで、
Ｇ_an，Ｇ_sn：復号化ゲイン
Ｅ_n：ｎ番のゲインコードベクトルを用いたときの符号化歪み
Ｘ_i：聴感重み付け音声
Ａ_i：聴感重み付けＬＰＣ合成済み適応音源
Ｓ_i：聴感重み付けＬＰＣ合成済み確率的音源
ｎ：コードベクトルの番号
ｉ：音源ベクトルのインデクス
Ｉ：サブフレーム長（入力音声の符号化単位）
【００５０】
この場合、演算量を少なくするために、パラメータ計算部１０９１では、コードベクトルの番号に依存しない部分の計算を行う。計算しておくものは、上記予測ベクトルと３つの合成音（Ｘ_i，Ａ_i，Ｓ_i）間の相関、パワである。この計算は、下記式３にしたがって行う。
【００５１】
【数３】

Ｄ_xx，Ｄ_xa，Ｄ_xs，Ｄ_aa，Ｄ_as，Ｄ_ss：合成音間の相関値、パワ
Ｘ_i：聴感重み付け音声
Ａ_i：聴感重み付けＬＰＣ合成済み適応音源
Ｓ_i：聴感重み付けＬＰＣ合成済み確率的音源
ｎ：コードベクトルの番号
ｉ：音源ベクトルのインデクス
Ｉ：サブフレーム長（入力音声の符号化単位）
【００５２】
また、パラメータ計算部１０９１では、復号化ベクトル格納部１０９６に格納された過去のコードベクトルと、予測係数格納部１０９５に格納された予測係数を用いて下記式４に示す３つの予測値を計算しておく。
【００５３】
【数４】

ここで、
Ｐ_ra：予測値（ＡＣゲイン）
Ｐ_rs：予測値（ＳＣゲイン）
Ｐ_sc：予測値（予測係数）
α_m：予測係数（ＡＣゲイン、固定値）
β_m：予測係数（ＳＣゲイン、固定値）
Ｓ_am：状態（過去のコードベクトルの要素、ＡＣゲイン）
Ｓ_sm：状態（過去のコードベクトルの要素、ＳＣゲイン）
Ｓ_cm：状態（過去のコードベクトルの要素、ＳＣ予測係数調整係数）
ｍ：予測インデクス
Ｍ：予測次数
【００５４】
上記式４から分かるように、Ｐ_rs、Ｐ_scについては、従来と異なり調整係数が乗算されている。したがって、ＳＣゲインの予測値及び予測係数については、調整係数により、以前のサブフレームにおける状態の値が極端に大きいか小さい場合に、それを緩和する（影響を小さくする）ことができる。すなわち、状態に応じて適応的にＳＣゲインの予測値及び予測係数を変化させることが可能となる。
【００５５】
次に、符号化歪計算部１０９２において、パラメータ計算部１０９１で計算した各パラメータ、予測係数格納部１０９５に格納された予測係数、及びベクトル符号帳１０９４に格納されたコードベクトルを用いて、下記式５にしたがって符号化歪を算出する。
【００５６】
【数５】

ここで、
Ｅ_n：ｎ番のゲインコードベクトルを用いたときの符号化歪み
Ｄ_xx，Ｄ_xa，Ｄ_xs，Ｄ_aa，Ｄ_as，Ｄ_ss：合成音間の相関値、パワ
Ｇ_an，Ｇ_sn：復号化ゲイン
Ｐ_ra：予測値（ＡＣゲイン）
Ｐ_rs：予測値（ＳＣゲイン）
Ｐ_ac：予測係数の和（固定値）
Ｐ_sc：予測係数の和（上記式４で算出）
Ｃ_an，Ｃ_sn，Ｃ_cn：コードベクトル、Ｃ_cnは予測係数調整係数であるがここでは使用しない
ｎ：コードベクトルの番号
なお、実際にはＤ_xxはコードベクトルの番号ｎに依存しないので、その加算を省略することができる。
【００５７】
次いで、比較部１０９３は、ベクトル符号帳１０９４と符号化歪計算部１０９２を制御し、ベクトル符号帳１０９４に格納された複数のコードベクトルの中で符号化歪計算部１０９２にて算出された符号化歪みの最も小さくなるコードベクトルの番号を求め、これをゲインの符号とする。また、得られたゲインの符号を用いて復号化ベクトル格納部１０９６の内容を更新する。更新は、下記式６にしたがって行う。
【００５８】
【数６】

ここで、
Ｓ_am，Ｓ_sm，Ｓ_cm：状態ベクトル（ＡＣ、ＳＣ、予測係数調整係数）
ｍ：予測インデクス
Ｍ：予測次数
Ｊ：比較部で求められた符号
【００５９】
式４から式６までで分かるように、本実施の形態では、復号化ベクトル格納部１０９６に過去のコードベクトルの要素である予測係数調整係数Ｓ_cmを格納しておいて、この予測係数調整係数を用いて予測係数を適応的に制御している。
【００６０】
図５は、本発明の実施の形態の音声復号化装置の構成を示すブロック図である。この音声復号化装置は、図１に示す音声復号化部１８に含まれている。なお、図５に示す適応符号帳２０２は図１に示すＲＡＭ２２に格納されており、図５に示す確率的符号帳２０３は図１に示すＲＯＭ２３に格納されている。
【００６１】
図５に示す音声復号化装置において、パラメータ復号化部２０１は、伝送路から、符号化された音声信号を得ると共に、各音源符号帳（適応符号帳２０２、確率的符号帳２０３）の音源サンプルの符号、ＬＰＣ符号、及びゲイン符号を得る。そして、ＬＰＣ符号から復号化されたＬＰＣ係数を得て、ゲイン符号から復号化されたゲインを得る。
【００６２】
そして、音源作成部２０４は、それぞれの音源サンプルに復号化されたゲインを乗じて加算することによって復号化された音源信号を得る。この際、得られた復号化された音源信号を、音源サンプルとして適応符号帳２０４へ格納し、同時に古い音源サンプルを破棄する。そして、ＬＰＣ合成部２０５では、復号化された音源信号に復号化されたＬＰＣ係数によるフィルタリングを行うことによって、合成音を得る。
【００６３】
また、２つの音源符号帳は、図２に示す音声符号化装置に含まれるもの（図２の参照符号１０３，１０４）と同様のものであり、音源サンプルを取り出すためのサンプル番号（適応符号帳への符号と確率的符号帳への符号）は、いずれもパラメータ復号化部２０１から供給される。
【００６４】
このように、本実施の形態の音声符号化装置では、各コードベクトルに応じて予測係数を制御することが可能になり、音声の局所的特徴により適応したより効率的な予測や、非定常部における予測の弊害を防ぐことが可能になり、従来得られなかった格別の効果を得ることができる。
【００６５】
（実施の形態２）
音声符号化装置において、上述したように、ゲイン演算部では、音源作成部から得られた適応符号帳、確率的符号帳の全ての音源について合成音と入力音声との間の比較を行う。このとき、演算量の都合上、通常は２つの音源（適応符号帳と確率的符号帳）はオープンループに探索される。以下、図２を参照して説明する。
【００６６】
このオープンループ探索においては、まず、音源作成部１０５は適応符号帳１０３からのみ音源候補を次々に選び、聴感重みＬＰＣ合成部１０６を機能させて合成音を得て、ゲイン演算部１０８へ送り、合成音と入力音声との間の比較を行って最適な適応符号帳１０３の符号を選択する。
【００６７】
次いで、上記適応符号帳１０３の符号を固定して、適応符号帳１０３からは同じ音源を選択し、確率的符号帳１０４からはゲイン演算部１０８の符号に対応した音源を次々に選択して聴感重みＬＰＣ合成部１０６へ伝送する。ゲイン演算部１０８で両合成音の和と入力音声との間の比較を行って確率的符号帳１０４の符号を決定する。
【００６８】
このアルゴリズムを用いた場合、全ての符号帳の符号をそれぞれに対して全て探索するよりは符号化性能は若干劣化するが、計算量は大幅に削減される。このため一般にはこのオープンループ探索が用いられる。
【００６９】
ここで、従来のオープンループの音源探索の中で代表的なアルゴリズムについて説明する。ここでは、１つの分析区間（フレーム）に対して２つのサブフレームで構成する場合の音源探索手順について説明する。
【００７０】
まず、ゲイン演算部１０８の指示を受けて、音源作成部１０５は適応符号帳１０３から音源を引出して聴感重みＬＰＣ合成部１０６へ送る。ゲイン演算部１０８において、合成された音源と第１サブフレームの入力音声との間の比較を繰り返して最適な符号を求める。ここで、適応符号帳の特徴を示す。適応符号帳は過去において合成に使用した音源である。そして、符号は、図６に示すようにタイムラグに対応している。
【００７１】
次に、適応符号帳１０３の符号が決まった後に、確率的符号帳の探索を行う。音源作成部１０５は適応符号帳１０３の探索で得られた符号の音源とゲイン演算部１０８で指定された確率的符号帳１０４の音源とを取り出して聴感重みＬＰＣ合成部１０６へ送る。そして、ゲイン演算部１０８において、聴感重み付け済みの合成音と聴感重み付け済みの入力音声との間の符号化歪みを計算し、最も適当な（二乗誤差が最小となるもの）確率的音源１０４の符号を決める。１つの分析区間（サブフレームが２の場合）での音源符号探索の手順を以下に示す。
【００７２】
１）第１サブフレームの適応符号帳の符号を決定
２）第１サブフレームの確率的符号帳の符号を決定
３）パラメータ符号化部１０９でゲインを符号化し、復号化ゲインで第１サブフレームの音源を作成し、適応符号帳１０３を更新する。
４）第２サブフレームの適応符号帳の符号を決定
５）第２サブフレームの確率的符号帳の符号を決定
６）パラメータ符号化部１０９でゲインを符号化し、復号化ゲインで第２サブフレームの音源を作成し、適応符号帳１０３を更新する。
【００７３】
上記アルゴリズムによって効率よく音源の符号化を行うことができる。しかしながら、最近では、さらなる低ビットレート化を目指し、音源のビット数を節約する工夫が行われている。特に注目されているのは、適応符号帳のラグに大きな相関があることを利用して、第１サブフレームの符号はそのままで、第２サブフレームの探索範囲を第１サブフレームのラグの近くに狭めて（エントリ数を減らして）ビット数を少なくするというアルゴリズムである。
【００７４】
このアルゴリズムでは、分析区間（フレーム）の途中から音声が変化する場合や、２つのサブフレームの様子が大きく異なる場合には局所的劣化を引き起こすことが考えられる。
【００７５】
本実施の形態では、符号化の前に２つのサブフレーム両方についてピッチ分析を行って相関値を算出し、得られた相関値に基づいて２つのサブフレームのラグの探索範囲を決定する探索方法を実現する音声符号化装置を提供する。
【００７６】
具体的には、本実施の形態の音声符号化装置は、１つのフレームを複数のサブフレームに分解してそれぞれを符号化するＣＥＬＰ型符号化装置において、最初のサブフレームの適応符号帳探索の前に、フレームを構成する複数のサブフレームのピッチ分析を行って相関値を算出するピッチ分析部と、上記ピッチ分析部がフレームを構成する複数のサブフレームの相関値を算出すると共に、その相関値の大小から各サブフレームで最もピッチ周期らしい値（代表ピッチと呼ぶ）を求め、ピッチ分析部にて得られた相関値と代表ピッチとに基づいて複数のサブフレームのラグの探索範囲を決定する探索範囲設定部と、を備えることを特徴としている。
【００７７】
そして、この音声符号化装置では、探索範囲設定部において、ピッチ分析部で得た複数のサブフレームの代表ピッチと相関値を利用して探索範囲の中心となる仮のピッチ（仮ピッチと呼ぶ）を求め、探索範囲設定部において、求めた仮ピッチの周りの指定の範囲にラグの探索区間を設定し、ラグの探索区間を設定するときに、仮ピッチの前後に探索範囲を設定する。また、その際に、ラグの短い部分の候補を少なくし、ラグのより長い範囲を広く設定し、適応符号帳探索の際に上記探索範囲設定部で設定された範囲でラグの探索を行う。
【００７８】
以下、本実施の形態に係る音声符号化装置について添付図面を用いて詳細に説明する。ここでは、１フレームは２サブフレームに分割されているものとする。３サブフレーム以上の場合でも同様の手順で符号化を行うことができる。
【００７９】
この音声符号化装置においては、いわゆるデルタラグ方式によるピッチ探索において、分割されたサブフレームについてすべてピッチを求め、ピッチ間でどの程度の相関があるかどうかを求めて、その相関結果に応じて探索範囲を決定する。
【００８０】
図７は、本発明の実施の形態２に係る音声符号化装置の構成を示すブロック図である。まず、ＬＰＣ分析部３０２において、入力された音声データ（入力音声）３０１に対して自己相関分析とＬＰＣ分析を行うことによってＬＰＣ係数を得る。また、ＬＰＣ分析部３０２において、得られたＬＰＣ係数の符号化を行ってＬＰＣ符号を得る。さらに、ＬＰＣ分析部３０２において、得られたＬＰＣ符号を復号化して復号化ＬＰＣ係数を得る。
【００８１】
次いで、ピッチ分析部３１０において、２サブフレーム分の入力音声のピッチ分析を行い、ピッチ候補とパラメータを求める。１サブフレームに対するアルゴリズムを以下に示す。相関係数は、下記式７により、２つ求められる。なおこの時、Ｃ_ppはＰ_minについてまず求め、あとのＰ_min+1、Ｐ_min+2については、フレーム端の値の足し引きで効率的に計算できる。
【００８２】
【数７】

ここで、
Ｘ_i，Ｘ_i-P：入力音声
Ｖ_p：自己相関関数
Ｃ_pp：パワ成分
ｉ：入力音声のサンプル番号
Ｌ：サブフレームの長さ
Ｐ：ピッチ
Ｐ_min，Ｐ_max：ピッチの探索を行う最小値と最大値
【００８３】
そして、上記式７で求めた自己相関関数とパワ成分はメモリに蓄えておき、次の手順で代表ピッチＰ₁を求める。これはＶ_pが正でＶ_p×Ｖ_p／Ｃ_ppを最大にするピッチＰを求める処理となっている。ただし、割り算は一般的に計算量がかかるので、分子と分母を２つとも格納し、掛け算に直して効率化を図っている。
【００８４】
ここでは、入力音声と入力音声からピッチ分過去の適応音源との差分の二乗和が最も小さくなるようなピッチを探す。この処理はＶ_p×Ｖ_p／Ｃ_ppを最大にするピッチＰを求める処理と等価となる。具体的な処理は以下のようになる。
【００８５】
１）初期化（Ｐ＝Ｐ_min、ＶＶ＝Ｃ＝０、Ｐ₁＝Ｐ_min）
２）もし（Ｖ_p×Ｖ_p×Ｃ＜ＶＶ×Ｃ_pp）又は（Ｖ_p＜０）ならば４）へ。それ以外なら３）へ。
３）ＶＶ＝Ｖ_p×Ｖ_p、Ｃ＝Ｃ_pp、Ｐ₁＝Ｐとして４）へ
４）Ｐ＝Ｐ＋１とする。この時Ｐ＞Ｐ_maxであれば終了、それ以外の場合には２）へ。
【００８６】
上記作業を２サブフレームのそれぞれについて行い、代表ピッチＰ₁、Ｐ₂と自己相関係数Ｖ_1p、Ｖ_2p、パワー成分Ｃ_1pp、Ｃ_2pp（Ｐ_min＜ｐ＜Ｐ_max）を求める。
【００８７】
次に、探索範囲設定部３１１で適応符号帳のラグの探索範囲を設定する。まず、その探索範囲の軸となる仮ピッチを求める。仮ピッチはピッチ分析部３１０で求めた代表ピッチとパラメータを用いて行う。
【００８８】
仮ピッチＱ₁、Ｑ₂は以下の手順で求める。なお、以下の説明においてラグの範囲として定数Ｔｈ（具体的には６程度が適当である）を用いる。また、相関値は上記式７で求めたものを用いる。
【００８９】
まず、Ｐ₁を固定した状態でＰ₁の付近（±Ｔｈ）で相関の最も大きい仮ピッチ（Ｑ₂）を見つける。
【００９０】
１）初期化（ｐ＝Ｐ₁−Ｔｈ、Ｃ_max＝０、Ｑ₁＝Ｐ₁、Ｑ₂＝Ｐ₁）
２）もし（Ｖ_1p1×Ｖ_1p1／Ｃ_1p1p1＋Ｖ_2p×Ｖ_2p／Ｃ_2pp＜Ｃ_max）または（Ｖ_2p＜０）ならば４）へ。それ以外なら３）へ。
３）Ｃ_max＝Ｖ_1p1×Ｖ_1p1／Ｃ_1p1p1＋Ｖ_2p×Ｖ_2p／Ｃ_2pp、Ｑ₂＝ｐとして４）へ
４）ｐ＝ｐ＋１として２）へ。ただし、この時ｐ＞Ｐ₁＋Ｔｈであれば５）へ。
【００９１】
このようにして２）〜４）の処理をＰ₁−Ｔｈ〜Ｐ₁＋Ｔｈまで行って、相関の最も大きいものＣ_maxと仮ピッチＱ₂を求める。
【００９２】
次に、Ｐ₂を固定した状態でＰ₂の付近（±Ｔｈ）で相関の最も大きい仮ピッチ（Ｑ₁）を求める。この場合、Ｃ_maxは初期化しない。Ｑ₂を求めた際のＣ_maxを含めて相関が最大となるＱ₁を求めることにより、第１，第２サブフレーム間で最大の相関を持つＱ₁，Ｑ₂を求めることが可能となる。
【００９３】
５）初期化（ｐ＝Ｐ₂−Ｔｈ）
６）もし（Ｖ_1p×Ｖ_1p／Ｃ_1pp＋Ｖ_2p2×Ｖ_2p2／Ｃ_2p2p2＜Ｃ_max）又は（Ｖ_1p＜０）ならば８）へ。それ以外は７）へ。
７）Ｃ_max＝Ｖ_1p×Ｖ_1p／Ｃ_1pp＋Ｖ_2p2×Ｖ_2p2／Ｃ_2p2p2、Ｑ₁＝ｐ、Ｑ₂＝Ｐ₂として８）へ。
８）ｐ＝ｐ＋１として６）へ。ただし、この時ｐ＞Ｐ₂＋Ｔｈであれば９）へ。
９）終了。
【００９４】
このようにして６）〜８）の処理をＰ₂−Ｔｈ〜Ｐ₂＋Ｔｈまで行って、相関の最も大きいものＣ_maxと仮ピッチＱ₁、Ｑ₂を求める。この時のＱ₁、Ｑ₂が第１サブフレームと第２サブフレームの仮ピッチである。
【００９５】
上記アルゴリズムにより、２つのサブフレームの相関を同時に評価しながら大きさに比較的差のない（差の最大はＴｈである）仮ピッチを２つ選択することができる。この仮ピッチを用いることにより、第２サブフレームの適応符号帳探索の際に、探索の範囲を狭く設定しても符号化性能を大きく劣化させることを防止できる。例えば、第２サブフレームから音質が急に変化した場合などで、第２サブフレームの相関が強い場合は、第２サブフレームの相関を反映したＱ₁を用いることで第２サブフレームの劣化を回避出来る。
【００９６】
さらに、探索範囲設定部３１１は、求めた仮ピッチＱ₁を用いて適応符号帳の探索を行う範囲（Ｌ__ST〜Ｌ__EN）を下記式８のようにして設定する。
【００９７】
【数８】

ここで、
Ｌ__ST：探索範囲の始点
Ｌ__EN：探索範囲の終点
Ｌ_min：ラグの最小値（例：２０）
Ｌ_max：ラグの最大値（例：１４３）
Ｔ₁：第１サブフレームの適応符号帳ラグ
【００９８】
上記設定において、第１サブフレームは探索範囲を狭める必要はない。しかしながら、本発明者らは、入力音声のピッチに基づいた値の付近を探索区間とした方が性能が良いことを実験により確認しており、本実施の形態では２６サンプルに狭めて探索するアルゴリズムを使用している。
【００９９】
また、第２サブフレームは第１サブフレームで求められたラグＴ₁を中心にその付近に探索範囲を設定している。したがって、合計３２エントリで、第２サブフレームの適応符号帳のラグを５ビットで符号化できることになる。また、本発明者らは、この時もラグの小さい候補を少なく、ラグの大きい候補を多く設定することにより、より良い性能が得られることを実験により確認している。ただし、これまでの説明でわかるように、本実施の形態においては、仮ピッチＱ₂は使用しない。
【０１００】
ここで、本実施の形態における効果について説明する。探索範囲設定部３１１によって得られた第１サブフレームの仮ピッチの近くには、第２サブフレームの仮ピッチも存在している（定数Ｔｈで制限したため）。また、第１サブフレームにおいて探索範囲を絞って探索しているので、探索の結果得られるラグは第１サブフレームの仮ピッチから離れない。
【０１０１】
したがって、第２サブフレームの探索の時には、第２サブフレームの仮ピッチから近い範囲を探索できることになり、第１，第２サブフレームの両方において適当なラグが探索できることになる。
【０１０２】
例として、第１サブフレームが無音で、第２サブフレームから音声が立ち上がった場合を考える。従来法では、探索範囲を狭めることで第２サブフレームのピッチが探索区間に含まれなくなると、音質は大きく劣化してしまう。本実施の形態に係る方法においては、ピッチ分析部の仮ピッチの分析において、代表ピッチＰ₂の相関は強く出る。したがって、第１サブフレームの仮ピッチはＰ₂付近の値になる。このため、デルタラグによる探索の際に、音声が立ち上がった部分に近い部分を仮ピッチとすることができる。すなわち、第２サブフレームの適応符号帳の探索の時には、Ｐ₂付近の値を探索できることになり、途中で音声の立ち上がり生じても劣化なくデルダラグにより第２サブフレームの適応符号帳探索を行うことができる。
【０１０３】
次に、音源作成部３０５において、適応符号帳３０３に格納された音源サンプル（適応コードベクトル又は適応音源）と確率的符号帳３０４に格納された音源サンプル（確率的コードベクトル又は確率的音源）を取り出し、それぞれを聴感重みＬＰＣ合成部３０６へ送る。さらに、聴感重みＬＰＣ合成部３０６において、音源作成部３０５で得られた２つの音源に対して、ＬＰＣ分析部３０２で得られた復号化ＬＰＣ係数によってフィルタリングを行って２つの合成音を得る。
【０１０４】
さらに、ゲイン演算部３０８においては、聴感重みＬＰＣ合成部３０６で得られた２つの合成音と聴感重み付け部３０７で聴感重み付けされた入力音声との関係を分析し、２つの合成音の最適値（最適ゲイン）を求める。また、ゲイン演算部３０８においては、その最適ゲインによってパワ調整したそれぞれの合成音を加算して総合合成音を得る。そして、ゲイン演算部３０８は、その総合合成音と入力音声の符号化歪みの計算を行う。また、ゲイン演算部３０８においては、適応符号帳３０３と確率的符号帳３０４の全ての音源サンプルに対して音源作成部３０５、聴感重みＬＰＣ合成部３０６を機能させることによって得られる多くの合成音と入力音声との間の符号化歪みを行い、その結果得られる符号化歪みの中で最も小さいときの音源サンプルのインデクスを求める。
【０１０５】
次に、得られた音源サンプルのインデクス、そのインデクスに対応する２つの音源、及び入力音声をパラメータ符号化部３０９へ送る。パラメータ符号化部３０９では、ゲインの符号化を行うことによってゲイン符号を得て、ＬＰＣ符号、音源サンプルのインデクスと共に伝送路へ送る。
【０１０６】
また、パラメータ符号化部３０９は、ゲイン符号と音源サンプルのインデクスに対応する２つの音源から実際の音源信号を作成し、それを適応符号帳３０３に格納すると同時に古い音源サンプルを破棄する。
【０１０７】
なお、聴感重みＬＰＣ合成部３０６においては、ＬＰＣ係数や高域強調フィルタや長期予測係数（入力音声の長期予測分析を行うことによって得られる）を用いた聴感重み付けフィルタを用いる。
【０１０８】
上記ゲイン演算部３０８は、音源作成部３０５から得られた適応符号帳３０３、確率的符号帳３０４の全ての音源について入力音声との間の比較を行うが、計算量削減のため、２つの音源（適応符号帳３０３と確率的符号帳３０４）については上述したようにしてオープンループにより探索する。
【０１０９】
このように、本実施の形態におけるピッチ探索方法により、最初のサブフレームの適応符号帳探索の前に、フレームを構成する複数のサブフレームのピッチ分析を行って相関値を算出することにより、フレーム内の全サブフレームの相関値を同時に把握することができる。
【０１１０】
そして、各サブフレームの相関値を算出すると共に、その相関値の大小から各サブフレームで最もピッチ周期らしい値（代表ピッチと呼ぶ）を求め、ピッチ分析で得られた相関値と代表ピッチに基づいて複数のサブフレームのラグの探索範囲を設定する。この探索範囲の設定においては、ピッチ分析で得た複数のサブフレームの代表ピッチと相関値を利用して探索範囲の中心となる差の少ない適当な仮のピッチ（仮ピッチと呼ぶ）を求める。
【０１１１】
さらに、上記探索範囲の設定で求めた仮ピッチの前後の指定の範囲にラグの探索区間を限定するので、適応符号帳の効率の良い探索を可能にする。その際、ラグの短い部分の候補を少なくし、ラグのより長い範囲を広く設定するので、良好な性能が得られる適当な探索範囲を設定することができる。また、適応符号帳探索の際に上記探索範囲の設定で設定された範囲でラグの探索を行うので、良好な復号化音を得ることができる符号化が可能になる。
【０１１２】
このように、本実施の形態によれば、探索範囲設定部３１１によって得られた第１サブフレームの仮ピッチの近くには第２サブフレームの仮ピッチも存在しており、第１サブフレームにおいて探索範囲を絞っているので、探索の結果得られるラグは仮ピッチから離れて行かない。したがって、第２サブフレームの探索の時には第２サブフレームの仮ピッチ付近を探索できることになり、フレームの後半から音声が始まる場合などの非定常なフレームでも、第１，第２サブフレームにおいて適当なラグ探索が可能になり、従来得られなかった格別の効果を得ることができる。
【０１１３】
上記実施の形態１，２に係る音声符号化／復号化は、音声符号化装置／音声復号化装置として説明しているが、これらの音声符号化／復号化をソフトウェアとして構成しても良い。例えば、上記音声符号化／復号化のプログラムをＲＯＭに格納し、そのプログラムにしたがってＣＰＵの指示により動作させるように構成しても良い。また、プログラム，適応符号帳，及び確率的符号帳（パルス拡散符号帳）をコンピュータで読み取り可能な記憶媒体に格納し、この記憶媒体のプログラム，適応符号帳，及び確率的符号帳（パルス拡散符号帳）をコンピュータのＲＡＭに記録して、プログラムにしたがって動作させるようにしても良い。このような場合においても、上記実施の形態１，２と同様の作用、効果を呈する。さらに、実施の形態１〜３におけるプログラムを通信端末でダウンロードし、その通信端末でプログラムを動作させるようにしても良い。
【０１１４】
なお、上記実施の形態１，２については、個々に実施しても良く、組み合わせて実施しても良い。
【０１１５】
【発明の効果】
以上説明したように本発明の音声符号化装置は、以前のサブフレームの状態に応じて予測符号化に用いる予測係数を調整するので、各コードベクトルに応じて予測係数を制御することが可能になり、音声の局所的特徴により適応したより効率的な予測や、非定常部における予測の弊害を防ぐことができる。
【図面の簡単な説明】
【図１】本発明の音声符号化装置を備えた無線通信装置の構成を示すブロック図
【図２】本発明の実施の形態１に係る音声符号化装置の構成を示すブロック図
【図３】図２に示す音声符号化装置におけるゲイン演算部の構成を示すブロック図
【図４】図２に示す音声符号化装置におけるパラメータ符号化部の構成を示すブロック図
【図５】本発明の実施の形態１に係る音声符号化装置で符号化された音声データを復号する音声復号化装置の構成を示すブロック図
【図６】適応符号帳探索を説明するための図
【図７】本発明の実施の形態２に係る音声符号化装置の構成を示すブロック図
【符号の説明】
１０２，３０２ＬＰＣ分析部
１０３，３０３適応符号帳
１０４，３０４確率的符号帳
１０５，３０５音源作成部
１０６，３０６聴感重みＬＰＣ合成部
１０７，３０７聴感重み付け部
１０８，３０８ゲイン演算部
１０９，３０９パラメータ符号化部
３１０ピッチ分析部
３１１探索範囲設定部
１０９１パラメータ計算部
１０９２符号化歪計算部
１０９３比較部
１０９４ベクトル符号帳
１０９５予測係数格納部
１０９６復号化ベクトル格納部[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a speech coding apparatus used in a digital communication system.
[0002]
[Prior art]
In the field of digital mobile communications such as cellular phones, low bit rate speech compression and coding methods are required to cope with the increase in subscribers, and research and development are progressing in each research institution.
[0003]
In Japan, the VSELP encoding method developed by Motorola with a bit rate of 11.2 kbps was adopted as the standard encoding method for digital mobile phones, and digital mobile phones equipped with this method were released in Japan in the fall of 1994. Has been.
[0004]
In addition, an encoding method called PSI-CELP having a bit rate of 5.6 kbps developed by NTT Mobile Communication Network Corporation has been commercialized. Each of these systems is an improvement on the system called CELP (Code Exited Linear Prediction: M.R.Schroeder "High Quality Speech at Low Bit Rates" Proc.ICASSP '85 pp.937-940).
[0005]
This CELP method separates speech into sound source information and vocal tract information, encodes the sound source information by an index of a plurality of sound source samples stored in the codebook, and uses LPC (Linear Prediction Coefficient) for the vocal tract information. It is characterized by adopting a method of encoding and comparing with input speech in consideration of vocal tract information at the time of sound source information encoding (ABS). is there.
[0006]
In this CELP method, first, autocorrelation analysis and LPC analysis are performed on input speech data (input speech) to obtain LPC coefficients, and the obtained LPC coefficients are encoded to obtain LPC codes. . Further, the obtained LPC code is decoded to obtain a decoded LPC coefficient. On the other hand, the input voice is perceptually weighted using a perceptual weighting filter using LPC coefficients.
[0007]
Obtained for each code vector of the sound source samples (referred to as the adaptive code vector (or adaptive sound source) and the stochastic code vector (or probabilistic sound source)) stored in the adaptive code book and the stochastic code book, respectively. Filtering is performed using the decoded LPC coefficients to obtain two synthesized sounds.
[0008]
Then, the relationship between the obtained two synthesized sounds and the auditory weighted input sound is analyzed, the optimum value (optimum gain) of the two synthesized sounds is obtained, and the synthesized sound is power-adjusted with the obtained optimum gain. Then, the synthesized sounds are added to obtain a synthesized synthesized sound. Thereafter, a coding distortion between the obtained synthetic speech and the input speech is obtained. In this way, the coding distortion between the total synthesized sound and the input speech is obtained for all sound source samples, and the index of the sound source sample when the coding distortion is the smallest is obtained.
[0009]
The gain and the index of the sound source sample thus obtained are encoded, and the encoded gain and the sound source sample are sent to the transmission path together with the LPC code. In addition, an actual sound source signal is created from two sound sources corresponding to the gain code and the index of the sound source sample, stored in the adaptive codebook, and at the same time, the old sound source sample is discarded.
[0010]
In general, the sound source search for the adaptive codebook and the stochastic codebook is performed in a section (called a subframe) obtained by further dividing the analysis section.
[0011]
The gain coding (gain quantization) is performed by vector quantization (VQ) that evaluates gain quantization distortion using two synthesized sounds corresponding to the index of the sound source sample.
[0012]
In this algorithm, a vector codebook in which a plurality of representative samples (code vectors) of parameter vectors are stored in advance is created. Next, for the perceptually weighted input speech and the perceptually weighted LPC composite of the adaptive sound source and the stochastic sound source, the encoding distortion is calculated by the following equation 1 using the gain code vector stored in the vector codebook. .
[0013]
[Expression 1]

here,
E_n: Coding distortion when using the nth gain code vector
X_i: Audible weighted voice
A_i: Auditory weighted LPC synthesized adaptive sound source
S_i: Auditory weighted LPC synthesized stochastic sound source
g_n: Code vector elements (Adaptive sound source gain)
h_n: Code vector elements (Gain on the stochastic sound source side)
n: Code vector number
i: Index of sound source data
I: Subframe length (coding unit of input speech)
[0014]
Then, the distortion E when using each code vector by controlling the vector codebook_nAnd the code vector number with the least distortion is used as the vector code. Also, the code vector number with the smallest distortion among all the code vectors stored in the vector codebook is obtained, and this is used as the code of the vector.
[0015]
Although the equation 1 seems to require a lot of calculation for each n at a glance, it is only necessary to calculate the sum of products for i in advance, so that n can be searched with a small amount of calculation. .
[0016]
On the other hand, the speech decoding apparatus (decoder) obtains a code vector by decoding the encoded data by obtaining a code vector based on the code of the transmitted vector.
[0017]
Further, further improvements have been made on the basis of the above algorithm. For example, using the fact that the auditory characteristic of human sound pressure is logarithm, the power is logarithmized and quantized, and two gains normalized by the power are VQed. This method is used in the standard system of the Japanese PDC half rate codec. In addition, there is a method (predictive coding) in which the gain parameter is used for interframe correlation. This method is described in ITU-T International Standard G.264. 729. However, even with these improvements, sufficient performance cannot be obtained.
[0018]
[Problems to be solved by the invention]
So far, gain information coding methods using human auditory characteristics and inter-frame correlation have been developed, and gain information can be coded with some efficiency. In particular, although the performance is greatly improved by predictive quantization, in the conventional method, predictive quantization is performed using the value of the previous subframe as it is as the state value. However, some of the values stored as states take extremely large (small) values. If these values are used for the next subframe, the quantization of the next subframe does not work, and the local May be an unusual noise.
[0019]
The present invention has been made in view of such points, and an object thereof is to provide a CELP speech coding apparatus that can perform speech coding without generating local abnormal noise using predictive quantization. .
[0020]
[Means for Solving the Problems]
The gist of the present invention is to generate local abnormal noise by automatically adjusting the prediction coefficient when the state value in the previous subframe is extremely large or extremely small in predictive quantization. Is to prevent.
[0021]
DETAILED DESCRIPTION OF THE INVENTION
The speech coding apparatus according to the present invention obtains a synthesized sound by filtering the adaptive sound source and the stochastic sound source stored in the adaptive code book and the stochastic code book using the LPC coefficient obtained from the input speech. LPC synthesis means, obtaining gains of the adaptive sound source and the stochastic sound source, and further using the coding distortion between the input speech and the synthesized sound obtained using the gain, Gain calculating means for searching for a code, and parameter encoding means for performing predictive coding of gain using an adaptive excitation and a stochastic excitation corresponding to the obtained code, the parameter encoding means Predictive coefficients used for the predictive coding according to the state of the subframe ofAutomaticallyA configuration including prediction coefficient adjustment means for adjustment is adopted.
[0022]
According to this configuration, it is possible to control the prediction coefficient in accordance with each code vector, and it is possible to prevent a more efficient prediction adapted to the local characteristics of the speech and the adverse effects of the prediction in the unsteady part. .
[0023]
The speech coding apparatus according to the present invention obtains a synthesized sound by filtering the adaptive sound source and the stochastic sound source stored in the adaptive code book and the stochastic code book using the LPC coefficient obtained from the input speech. LPC synthesis means, gain calculation means for obtaining gains of the adaptive sound source and the stochastic sound source, adaptive sound sources and stochastic sound sources obtained using coding distortion between the input speech and the synthesized sound, and Parameter encoding means for performing vector quantization of the gain, and a CELP speech encoding apparatus that performs encoding by decomposing one frame into a plurality of subframes, and adapting the first subframe Prior to the codebook search, a pitch analysis of a plurality of subframes constituting a frame is performed to obtain a correlation value, and a value closest to the pitch period is calculated using the correlation value. A configuration with a pitch analysis means.
[0024]
According to this configuration, since the vicinity of the temporary pitch of the second subframe can be searched when searching for the second subframe, the first and second subframes can be used even in a non-stationary frame such as when speech starts from the second half of the frame. This makes it possible to search for an appropriate lag.
[0025]
Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.
(Embodiment 1)
FIG. 1 is a block diagram showing a configuration of a wireless communication apparatus provided with a speech encoding apparatus according to Embodiments 1 and 2 of the present invention.
[0026]
In this wireless communication device, the voice is converted into an electrical analog signal by the voice input device 11 such as a microphone on the transmission side, and is output to the A / D converter 12. The analog audio signal is converted into a digital audio signal by the A / D converter 12 and output to the audio encoding unit 13. The speech encoding unit 13 performs speech encoding processing on the digital speech signal and outputs the encoded information to the modem unit 14. The modem unit 14 digitally modulates the encoded audio signal and sends it to the wireless transmission unit 15. The wireless transmission unit 15 performs predetermined wireless transmission processing on the modulated signal. This signal is transmitted via the antenna 16. The processor 21 performs processing using data stored in the RAM 22 and ROM 23 as appropriate.
[0027]
On the other hand, on the reception side of the wireless communication apparatus, the reception signal received by the antenna 16 is subjected to predetermined wireless reception processing by the wireless reception unit 17 and is sent to the modulation / demodulation unit 14. The modem unit 14 performs demodulation processing on the received signal and outputs the demodulated signal to the speech decoding unit 18. The audio decoding unit 18 performs a decoding process on the demodulated signal to obtain a digital decoded audio signal, and outputs the digital decoded audio signal to the D / A converter 19. The D / A converter 19 converts the digital decoded audio signal output from the audio decoding unit 18 into an analog decoded audio signal and outputs the analog decoded audio signal to an audio output device 20 such as a speaker. Finally, the audio output device 20 converts the electrical analog decoded audio signal into decoded audio and outputs it.
[0028]
Here, the speech encoding unit 13 and the speech decoding unit 18 are operated by a processor 21 such as a DSP using a code book stored in the RAM 22 and the ROM 23. These operation programs are stored in the ROM 23.
[0029]
FIG. 2 is a block diagram showing the configuration of the CELP speech coding apparatus according to Embodiment 1 of the present invention. This speech encoding apparatus is included in speech encoding unit 13 shown in FIG. The adaptive codebook 103 shown in FIG. 2 is stored in the RAM 22 shown in FIG. 1, and the probabilistic codebook 104 shown in FIG. 2 is stored in the ROM 23 shown in FIG.
[0030]
In the speech encoding apparatus shown in FIG. 2, the LPC analysis unit 102 performs autocorrelation analysis and LPC analysis on the input speech data (input speech) 101 to obtain LPC coefficients. Also, the LPC analysis unit 102 encodes the obtained LPC coefficient to obtain an LPC code. Further, the LPC analysis unit 102 decodes the obtained LPC code to obtain decoded LPC coefficients. The input audio data 101 is sent to the perceptual weighting unit 107 where the perceptual weighting is performed using the perceptual weighting filter using the LPC coefficient.
[0031]
Next, in the sound source creation unit 105, a sound source sample (adaptive code vector or adaptive sound source) stored in the adaptive codebook 103 and a sound source sample (stochastic code vector or stochastic sound source) stored in the probabilistic codebook 104 Are sent to the audible weight LPC synthesis unit 106. Further, the audible weight LPC synthesis unit 106 filters the two sound sources obtained by the sound source creation unit 105 with the decoded LPC coefficient obtained by the LPC analysis unit 102 to obtain two synthesized sounds.
[0032]
Note that the perceptual weight LPC synthesis unit 106 uses a perceptual weighting filter using an LPC coefficient, a high-frequency emphasis filter, and a long-term prediction coefficient (obtained by performing long-term prediction analysis of the input speech) and uses each synthesized sound. Is subjected to auditory weighting LPC synthesis.
[0033]
The audible weight LPC synthesis unit 106 outputs two synthesized sounds to the gain calculation unit 108. The gain calculation unit 108 has the configuration shown in FIG. In the gain calculation unit 108, the two synthesized sounds obtained by the perceptual weight LPC synthesis unit 106 and the perceptually weighted input speech are sent to the analysis unit 1081, where the relationship between the two synthesized sounds and the input speech is analyzed. The optimum value (optimum gain) of the two synthesized sounds is obtained. This optimum gain is output to the power adjustment unit 1082.
[0034]
The power adjustment unit 1082 adjusts the power of the two synthesized sounds with the obtained optimum gain. The power-adjusted synthesized sound is output to the synthesizing unit 1083, where it is added to become a total synthesized sound. This total synthesized sound is output to the coding distortion calculation unit 1084. The coding distortion calculation unit 1084 obtains coding distortion between the obtained synthetic speech and input speech.
[0035]
The encoding distortion calculation unit 1084 controls the sound source creation unit 105 to output all the sound source samples of the adaptive codebook 103 and the stochastic codebook 104, and for all the sound source samples, the total synthesized sound, the input speech, And the index of the sound source sample when the coding distortion is the smallest.
[0036]
Next, the analysis unit 1081 sends the index of the sound source sample, two perceptually weighted LPC synthesized sound sources corresponding to the index, and the input speech to the parameter encoding unit 109.
[0037]
The parameter encoding unit 109 obtains a gain code by performing gain encoding, and collectively sends the LPC code and the index of the sound source sample to the transmission path. Further, an actual sound source signal is created from two sound sources corresponding to the gain code and the index, and stored in the adaptive codebook 103, and at the same time, the old sound source sample is discarded. In general, the sound source search for the adaptive codebook and the stochastic codebook is performed in a section (called a subframe) obtained by further dividing the analysis section.
[0038]
Here, the operation of gain encoding of parameter encoding section 109 of the speech encoding apparatus having the above configuration will be described. FIG. 4 is a block diagram showing the configuration of the parameter encoding unit of the speech encoding apparatus of the present invention.
[0039]
In FIG. 4, the auditory weighted input sound (X_i), Perceptual weighting LPC synthesized adaptive sound source (A_i), And auditory weighting LPC synthesized stochastic sound source (S_i) Is sent to the parameter calculation unit 1091. The parameter calculation unit 1091 calculates parameters necessary for encoding distortion calculation. The parameter calculated by the parameter calculation unit 1091 is output to the coding distortion calculation unit 1092 where the coding distortion is calculated. The encoding distortion is output to the comparison unit 1093. The comparison unit 1093 controls the coding distortion calculation unit 1092 and the vector codebook 1094 to obtain the most appropriate code (decoded vector) from the obtained coding distortion, and based on this code, the vector codebook The code vector obtained from 1094 is output to the decoded vector storage unit 1096, and the decoded vector storage unit 1096 is updated.
[0040]
The prediction coefficient storage unit 1095 stores prediction coefficients used for predictive coding. Since this prediction coefficient is used for parameter calculation and coding distortion calculation, it is output to the parameter calculation section 1091 and coding distortion calculation section 1092. The decoded vector storage unit 1096 stores a state for predictive encoding. Since this state is used for parameter calculation, it is output to the parameter calculation unit 1091. The vector codebook 1094 stores code vectors.
[0041]
Next, an algorithm of the gain encoding method according to the present invention will be described.
A vector codebook 1094 in which a plurality of representative samples (code vectors) of quantization target vectors are stored in advance is created. Each vector includes three elements: an AC gain, a value corresponding to a logarithmic value of the SC gain, and an adjustment coefficient of the SC prediction coefficient.
[0042]
This adjustment coefficient is a coefficient that adjusts the prediction coefficient in accordance with the state of the previous subframe. Specifically, this adjustment coefficient is set to reduce the influence when the state of the previous subframe is an extremely large value or an extremely small value. This adjustment coefficient can be obtained by a learning algorithm developed by the present inventors using a large number of vector samples. Here, description of this learning algorithm is omitted.
[0043]
For example, a large adjustment coefficient is set for a code vector frequently used for voiced sound. That is, when the same waveform is arranged, since the reliability of the state of the previous subframe is high, the adjustment coefficient is increased so that the prediction coefficient of the previous subframe can be used as it is. Thereby, more efficient prediction can be performed.
[0044]
On the other hand, an adjustment coefficient is made small for a code vector that is used infrequently and used less frequently. In other words, if the previous waveform is completely different, the reliability of the previous subframe state is low (the adaptive codebook is considered not to function), so the adjustment coefficient is reduced and the prediction coefficient of the previous subframe is reduced. Reduce the impact of As a result, it is possible to prevent the adverse effect of the next prediction and realize good predictive coding.
[0045]
In this way, by controlling the prediction coefficient according to each code vector (state), it is possible to further improve the performance of the prediction encoding so far.
[0046]
The prediction coefficient storage unit 1095 stores a prediction coefficient for performing predictive coding. This prediction coefficient is an MA (moving average) prediction coefficient, and stores two types of AC and SC corresponding to the prediction order. These prediction coefficient values are generally obtained in advance by learning using a large amount of data. The decoded vector storage unit 1096 stores a value indicating a silent state as an initial value.
[0047]
Next, the encoding method will be described in detail. First, the audibility weighted input speech (X_i), Perceptual weighting LPC synthesized adaptive sound source (A_i), Auditory weighting LPC synthesized stochastic sound source (S_i), And the decoded vectors (AC, SC, adjustment coefficients) stored in the decoded vector storage unit 1096 and the prediction coefficients (AC, SC) stored in the prediction coefficient storage unit 1095 are transmitted. These parameters are used to calculate parameters necessary for coding distortion calculation.
[0048]
The encoding distortion calculation in the encoding distortion calculation unit 1092 is performed according to the following equation 2.
[0049]
[Expression 2]

here,
G_an, G_sn: Decoding gain
E_n: Coding distortion when using the nth gain code vector
X_i: Audible weighted voice
A_i: Auditory weighted LPC synthesized adaptive sound source
S_i: Auditory weighted LPC synthesized stochastic sound source
n: Code vector number
i: Index of sound source vector
I: Subframe length (coding unit of input speech)
[0050]
In this case, in order to reduce the amount of calculation, the parameter calculation unit 1091 calculates a portion that does not depend on the code vector number. What is calculated is the predicted vector and three synthesized sounds (X_i, A_i, S_i) Correlation, power. This calculation is performed according to the following equation 3.
[0051]
[Equation 3]

D_xx, D_xa, D_xs, D_aa, D_as, D_ss: Correlation value between synthesized sounds, power
X_i: Audible weighted voice
A_i: Auditory weighted LPC synthesized adaptive sound source
S_i: Auditory weighted LPC synthesized stochastic sound source
n: Code vector number
i: Index of sound source vector
I: Subframe length (coding unit of input speech)
[0052]
In addition, the parameter calculation unit 1091 calculates three prediction values shown in the following equation 4 using the past code vector stored in the decoded vector storage unit 1096 and the prediction coefficient stored in the prediction coefficient storage unit 1095. Keep it.
[0053]
[Expression 4]

here,
P_ra: Predicted value (AC gain)
P_rs: Predicted value (SC gain)
P_sc: Predicted value (prediction coefficient)
α_m: Prediction coefficient (AC gain, fixed value)
β_m: Prediction coefficient (SC gain, fixed value)
S_am: State (element of past code vector, AC gain)
S_sm: State (element of past code vector, SC gain)
S_cm: State (element of past code vector, SC prediction coefficient adjustment coefficient)
m: Predictive index
M: Predicted order
[0054]
As can be seen from Equation 4 above, P_rs, P_scIs multiplied by an adjustment coefficient, unlike the conventional case. Therefore, with respect to the prediction value and prediction coefficient of SC gain, when the value of the state in the previous subframe is extremely large or small, it can be mitigated (influence reduced) by the adjustment coefficient. That is, it is possible to adaptively change the SC gain prediction value and the prediction coefficient according to the state.
[0055]
Next, the encoding distortion calculation unit 1092 uses the parameters calculated by the parameter calculation unit 1091, the prediction coefficients stored in the prediction coefficient storage unit 1095, and the code vector stored in the vector codebook 1094 to obtain the following equation: The coding distortion is calculated according to 5.
[0056]
[Equation 5]

here,
E_n: Coding distortion when using the nth gain code vector
D_xx, D_xa, D_xs, D_aa, D_as, D_ss: Correlation value between synthesized sounds, power
G_an, G_sn: Decoding gain
P_ra: Predicted value (AC gain)
P_rs: Predicted value (SC gain)
P_ac: Sum of prediction coefficients (fixed value)
P_sc: Sum of prediction coefficients (calculated by equation 4 above)
C_an, C_sn, C_cn: Code vector, C_cnIs a prediction coefficient adjustment factor but is not used here
n: Code vector number
Actually, D_xxDoes not depend on the code vector number n, so the addition can be omitted.
[0057]
Next, the comparison unit 1093 controls the vector codebook 1094 and the coding distortion calculation unit 1092, and encodes the coding code calculated by the coding distortion calculation unit 1092 among the plurality of code vectors stored in the vector codebook 1094. The code vector number with the smallest distortion is obtained, and this is used as the sign of the gain. Also, the content of decoded vector storage section 1096 is updated using the gain code obtained. The update is performed according to the following formula 6.
[0058]
[Formula 6]

here,
S_am, S_sm, S_cm: State vector (AC, SC, prediction coefficient adjustment coefficient)
m: Predictive index
M: Predicted order
J: Code obtained by the comparison unit
[0059]
As can be seen from Expression 4 to Expression 6, in this embodiment, the decoded vector storage unit 1096Past codevectorPrediction coefficient adjustment factor that is an element ofS_cm, And the prediction coefficient is adaptively controlled using this prediction coefficient adjustment coefficient.
[0060]
FIG. 5 is a block diagram showing a configuration of the speech decoding apparatus according to the embodiment of the present invention. This speech decoding apparatus is included in the speech decoding unit 18 shown in FIG. The adaptive codebook 202 shown in FIG. 5 is stored in the RAM 22 shown in FIG. 1, and the probabilistic codebook 203 shown in FIG. 5 is stored in the ROM 23 shown in FIG.
[0061]
In the speech decoding apparatus shown in FIG. 5, parameter decoding section 201 obtains an encoded speech signal from the transmission path, and excitation samples of each excitation codebook (adaptive codebook 202 and probabilistic codebook 203). , LPC code, and gain code. Then, an LPC coefficient decoded from the LPC code is obtained, and a decoded gain is obtained from the gain code.
[0062]
Then, the sound source creation unit 204 obtains a decoded sound source signal by multiplying each sound source sample by the decoded gain and adding the result. At this time, the obtained decoded excitation signal is stored in the adaptive codebook 204 as an excitation sample, and the old excitation sample is discarded at the same time. Then, the LPC synthesis unit 205 obtains a synthesized sound by performing filtering on the decoded sound source signal using the decoded LPC coefficient.
[0063]
The two excitation codebooks are the same as those included in the speech encoding apparatus shown in FIG. 2 (

reference numerals

103 and 104 in FIG. 2), and sample numbers for extracting excitation samples (adaptive codebooks). Both the code to and the code to the stochastic codebook are supplied from the parameter decoding unit 201.
[0064]
As described above, in the speech coding apparatus according to the present embodiment, the prediction coefficient can be controlled according to each code vector, and more efficient prediction adapted to local features of speech and unsteady part This makes it possible to prevent the adverse effects of predictions and to obtain exceptional effects that have not been obtained in the past.
[0065]
(Embodiment 2)
In the speech coding apparatus, as described above, the gain calculation unit compares the synthesized sound and the input speech for all the sound sources of the adaptive codebook and the stochastic codebook obtained from the sound source creation unit. At this time, normally, two sound sources (adaptive codebook and probabilistic codebook) are searched in an open loop for the convenience of computation. Hereinafter, a description will be given with reference to FIG.
[0066]
In this open loop search, first, the sound source creation unit 105 selects sound source candidates one after another only from the adaptive codebook 103, obtains a synthesized sound by causing the perceptual weight LPC synthesis unit 106 to function, and sends it to the gain calculation unit 108. A comparison between the synthesized speech and the input speech is performed to select an optimum code of the adaptive codebook 103.
[0067]
Next, the code of the adaptive codebook 103 is fixed, the same sound source is selected from the adaptive codebook 103, and the sound source corresponding to the code of the gain calculation unit 108 is selected one after another from the probabilistic codebook 104. The data is transmitted to the weight LPC synthesis unit 106. The gain calculator 108 compares the sum of both synthesized sounds and the input speech to determine the code of the stochastic codebook 104.
[0068]
When this algorithm is used, the coding performance is slightly deteriorated compared to searching all codes of all codebooks, but the calculation amount is greatly reduced. For this reason, this open loop search is generally used.
[0069]
Here, a typical algorithm in the conventional open loop sound source search will be described. Here, a sound source search procedure in the case where a single analysis section (frame) is composed of two subframes will be described.
[0070]
First, upon receiving an instruction from the gain calculation unit 108, the sound source creation unit 105 extracts a sound source from the adaptive codebook 103 and sends it to the audible weight LPC synthesis unit 106. In gain calculation section 108, the comparison between the synthesized sound source and the input speech of the first subframe is repeated to obtain the optimum code. Here, features of the adaptive codebook are shown. The adaptive codebook is a sound source used for synthesis in the past. And the code | symbol respond | corresponds to a time lag as shown in FIG.
[0071]
Next, after the codes of the adaptive codebook 103 are determined, the probabilistic codebook is searched. The sound source creation unit 105 extracts the sound source of the code obtained by the search of the adaptive code book 103 and the sound source of the probabilistic code book 104 specified by the gain calculation unit 108 and sends them to the audible weight LPC synthesis unit 106. Then, gain calculation section 108 calculates coding distortion between the perceptually weighted synthesized sound and perceptually weighted input speech, and the most appropriate (the one with the least square error) code of stochastic sound source 104. Decide. The excitation code search procedure in one analysis section (when the subframe is 2) is shown below.
[0072]
1) Determine the code of the adaptive codebook of the first subframe
2) Determine the code of the probabilistic codebook of the first subframe
3) The parameter encoding unit 109 encodes the gain, creates the first subframe excitation with the decoding gain, and updates the adaptive codebook 103.
4) Determine the code of the adaptive codebook of the second subframe
5) Determine the code of the probabilistic codebook of the second subframe
6) The parameter encoding unit 109 encodes the gain, creates a second subframe excitation with the decoding gain, and updates the adaptive codebook 103.
[0073]
The sound source can be efficiently encoded by the above algorithm. Recently, however, efforts have been made to save the number of bits of a sound source in order to further reduce the bit rate. Of particular interest is the fact that there is a large correlation in the lag of the adaptive codebook, so that the code of the first subframe remains unchanged and the search range of the second subframe is close to the lag of the first subframe. The number of bits is reduced (by reducing the number of entries).
[0074]
In this algorithm, it is conceivable that local degradation is caused when the voice changes from the middle of the analysis section (frame) or when the states of the two subframes are greatly different.
[0075]
In the present embodiment, a search method for calculating a correlation value by performing pitch analysis for both two subframes before encoding and determining a search range of lags of the two subframes based on the obtained correlation value A speech encoding device that achieves the above is provided.
[0076]
Specifically, the speech coding apparatus according to the present embodiment is a CELP type coding apparatus that decomposes one frame into a plurality of subframes and codes each of them, and performs adaptive codebook search for the first subframe. Before, a pitch analysis unit that calculates a correlation value by performing a pitch analysis of a plurality of subframes constituting a frame, and the pitch analysis unit calculates a correlation value of a plurality of subframes that constitute a frame, and the correlation The value that is most likely to be the pitch period in each subframe (called the representative pitch) is obtained from the magnitude of the value, and the lag search range for multiple subframes is determined based on the correlation value obtained by the pitch analyzer and the representative pitch. And a search range setting unit.
[0077]
In this speech encoding apparatus, the search range setting unit uses a representative pitch and a correlation value of a plurality of subframes obtained by the pitch analysis unit to use a temporary pitch that is the center of the search range (referred to as a temporary pitch). In the search range setting unit, the search range of the lag is set in a specified range around the calculated temporary pitch, and the search range is set before and after the temporary pitch when the search interval of the lag is set. At that time, candidates for short lag portions are reduced, a longer lag range is set wider, and lag search is performed within the range set by the search range setting unit during adaptive codebook search.
[0078]
Hereinafter, the speech coding apparatus according to the present embodiment will be described in detail with reference to the accompanying drawings. Here, it is assumed that one frame is divided into two subframes. Even in the case of three or more subframes, encoding can be performed in the same procedure.
[0079]
In this speech coding apparatus, in the pitch search by the so-called delta lag method, the pitch is obtained for all the divided subframes, the degree of correlation between the pitches is obtained, and the search range is determined according to the correlation result. To decide.
[0080]
FIG. 7 is a block diagram showing the configuration of the speech coding apparatus according to Embodiment 2 of the present invention. First, the LPC analysis unit 302 obtains LPC coefficients by performing autocorrelation analysis and LPC analysis on the input voice data (input voice) 301. In addition, the LPC analysis unit 302InThen, the obtained LPC coefficient is encoded to obtain an LPC code. Further, the LPC analysis unit 302 decodes the obtained LPC code to obtain decoded LPC coefficients.
[0081]
Next, the pitch analysis unit 310 performs pitch analysis of input speech for two subframes to obtain pitch candidates and parameters. The algorithm for one subframe is shown below. Two correlation coefficients are obtained by the following equation (7). At this time, C_ppIs P_minFirst asked for, then P_{min + 1}, P_{min + 2}Can be calculated efficiently by adding and subtracting the value at the end of the frame.
[0082]
[Expression 7]

here,
X_i, X_iP: Input voice
V_p: Autocorrelation function
C_pp: Power ingredient
i: Input audio sample number
L: Length of subframe
P: Pitch
P_min, P_max: Minimum and maximum pitch search
[0083]
Then, the autocorrelation function and the power component obtained by the above equation 7 are stored in a memory, and the representative pitch P is₁Ask for. This is V_pIs positive and V_p× V_p/ C_ppThis is a process for obtaining a pitch P that maximizes. However, since division generally requires a calculation amount, both the numerator and the denominator are stored, and the efficiency is improved by multiplying them.
[0084]
Here, a search is made for a pitch that minimizes the sum of squares of the difference between the input sound and the input sound from the adaptive sound source that is past the pitch. This process is V_p× V_p/ C_ppThis is equivalent to the process of obtaining the pitch P that maximizes Specific processing is as follows.
[0085]
1) Initialization (P = P_min, VV = C = 0, P₁= P_min)
2) If (V_p× V_p× C <VV × C_pp) Or (V_pIf <0), go to 4). Otherwise go to 3).
3) VV = V_p× V_p, C = C_pp, P₁= P as 4)
4) P = P + 1. At this time P> P_maxIf so, finish, otherwise go to 2).
[0086]
The above operation is performed for each of the two subframes and the representative pitch P₁, P₂And autocorrelation coefficient V_1p, V_2p, Power component C_1pp, C_2pp(P_min<P <P_max)
[0087]
Next, the search range setting unit 311 sets the search range of the lag of the adaptive codebook. First, a temporary pitch that is the axis of the search range is obtained. The provisional pitch is determined using the representative pitch and parameters obtained by the pitch analysis unit 310.
[0088]
Temporary pitch Q₁, Q₂Is obtained by the following procedure. In the following description, a constant Th (specifically, about 6 is appropriate) is used as the range of the lag. Further, the correlation value obtained by Equation 7 is used.
[0089]
First, P₁With P fixed₁Temporary pitch (Q₂Find).
[0090]
1) Initialization (p = P₁-Th, C_max= 0, Q₁= P₁, Q₂= P₁)
2) If (V_1p1× V_1p1/ C_1p1p1+ V_2p× V_2p/ C_2pp<C_max) Or (V_2pIf <0), go to 4). Otherwise go to 3).
3) C_max= V_1p1× V_1p1/ C_1p1p1+ V_2p× V_2p/ C_2pp, Q₂= P to 4)
4) Go to 2) with p = p + 1. However, at this time p> P₁If + Th, go to 5).
[0091]
In this way, the processing of 2) to 4) is performed as P.₁-Th ~ P₁Go up to + Th and have the largest correlation C_maxAnd provisional pitch Q₂Ask for.
[0092]
Next, P₂With P fixed₂Temporary pitch (Q₁) In this case, C_maxDoes not initialize. Q₂C when seeking_maxQ including the largest correlation₁Q having the maximum correlation between the first and second subframes₁, Q₂Can be obtained.
[0093]
5) Initialization (p = P₂-Th)
6) If (V_1p× V_1p/ C_1pp+ V_2p2× V_2p2/ C_2p2p2<C_max) Or (V_1pIf <0), go to 8). Otherwise go to 7).
7) C_max= V_1p× V_1p/ C_1pp+ V_2p2× V_2p2/ C_2p2p2, Q₁= P, Q₂= P₂Go to 8).
8) Go to 6) with p = p + 1. However, at this time p> P₂If + Th, go to 9).
9) End.
[0094]
In this way, the processing of 6) to 8) is performed as P.₂-Th ~ P₂Go up to + Th and have the largest correlation C_maxAnd provisional pitch Q₁, Q₂Ask for. Q at this time₁, Q₂Is the provisional pitch of the first subframe and the second subframe.
[0095]
With the above algorithm, it is possible to select two provisional pitches that have relatively little difference in size (the maximum difference is Th) while simultaneously evaluating the correlation between two subframes. By using this provisional pitch, it is possible to prevent the coding performance from being greatly deteriorated even when the search range is set narrow during the adaptive codebook search of the second subframe. For example, when the sound quality suddenly changes from the second subframe and the correlation of the second subframe is strong, the Q that reflects the correlation of the second subframe is reflected.₁By using, deterioration of the second subframe can be avoided.
[0096]
Further, the search range setting unit 311 determines the calculated temporary pitch Q.₁The range (L__ST~ L__EN) Is set as shown in Equation 8 below.
[0097]
[Equation 8]

  here,
  L__ST: Search rangestart point
  L__EN: Search rangeend point
  L_min: Minimum value of lag (example: 20)
  L_max: Maximum value of lag (example: 143)
  T₁: 1stsubAdaptive codebook lag for frames
[0098]
In the above setting, it is not necessary to narrow the search range for the first subframe. However, the present inventors have confirmed through experiments that the performance is better when the vicinity of the value based on the pitch of the input speech is set as the search interval. In the present embodiment, the algorithm is searched by narrowing to 26 samples. Is used.
[0099]
The second subframe is the lag T obtained in the first subframe.₁The search range is set near the center of the. Therefore, the lag of the adaptive codebook of the second subframe can be encoded with 5 bits with a total of 32 entries. In addition, the present inventors have confirmed by experiments that a better performance can be obtained by setting a small number of candidates with a small lag and a large number of candidates with a large lag. However, as can be seen from the above description, in the present embodiment, the temporary pitch Q₂Is not used.
[0100]
Here, the effect in this Embodiment is demonstrated. The temporary pitch of the second subframe also exists near the temporary pitch of the first subframe obtained by the search range setting unit 311 (because it is limited by the constant Th). In addition, since the search is performed by narrowing the search range in the first subframe, the lag obtained as a result of the search does not deviate from the temporary pitch of the first subframe.
[0101]
Therefore, when searching for the second subframe, a range close to the provisional pitch of the second subframe can be searched, and an appropriate lag can be searched for in both the first and second subframes.
[0102]
As an example, let us consider a case where the first subframe is silent and the voice rises from the second subframe. In the conventional method, if the pitch of the second subframe is not included in the search section by narrowing the search range, the sound quality is greatly deteriorated. In the method according to the present embodiment, the representative pitch P in the provisional pitch analysis of the pitch analysis unit.₂There is a strong correlation. Therefore, the temporary pitch of the first subframe is P₂It becomes a value near. For this reason, in the search by the delta lag, a portion close to the portion where the voice rises can be set as the temporary pitch. That is, when searching for the adaptive codebook of the second subframe, P₂A nearby value can be searched, and an adaptive codebook search of the second subframe can be performed by deldrag without degradation even if a voice rise occurs midway.
[0103]
Next, in the sound source creation unit 305, the sound source sample (adaptive code vector or adaptive sound source) stored in the adaptive codebook 303 and the sound source sample (stochastic code vector or stochastic sound source) stored in the probabilistic codebook 304 are used. Each is taken out and sent to the auditory weight LPC synthesis unit 306. Further, the audible weight LPC synthesis unit 306 performs filtering on the two sound sources obtained by the sound source creation unit 305 with the decoded LPC coefficients obtained by the LPC analysis unit 302 to obtain two synthesized sounds.
[0104]
Further, the gain calculation unit 308 analyzes the relationship between the two synthesized sounds obtained by the perceptual weight LPC synthesis unit 306 and the input speech weighted by the perceptual weighting unit 307, and determines the optimum value ( Find the optimal gain. In addition, gain calculation section 308 adds the synthesized sounds that have been power-adjusted with the optimum gain to obtain an overall synthesized sound. Then, the gain calculation unit 308 calculates the coding distortion of the total synthesized sound and the input speech. In the gain calculation unit 308, many synthesized sounds obtained by causing the sound source creation unit 305 and the auditory weight LPC synthesis unit 306 to function on all the sound source samples of the adaptive codebook 303 and the stochastic codebook 304 Coding distortion with the input speech is performed, and the index of the sound source sample when the coding distortion obtained as a result is the smallest is obtained.
[0105]
Next, the index of the obtained sound source sample, the two sound sources corresponding to the index, and the input speech are sent to the parameter encoding unit 309. The parameter encoding unit 309 obtains a gain code by performing gain encoding, and sends the gain code together with the LPC code and the index of the sound source sample to the transmission line.
[0106]
The parameter encoding unit 309 creates an actual sound source signal from two sound sources corresponding to the gain code and the index of the sound source sample, stores it in the adaptive codebook 303, and simultaneously discards the old sound source sample.
[0107]
Note that the perceptual weight LPC synthesis unit 306 uses a perceptual weighting filter that uses an LPC coefficient, a high-frequency emphasis filter, and a long-term prediction coefficient (obtained by performing long-term prediction analysis of input speech).
[0108]
The gain calculation unit 308 performs comparison between the input speech for all sound sources of the adaptive codebook 303 and the probabilistic codebook 304 obtained from the sound source creation unit 305. (Adaptive codebook 303 and probabilistic codebook 304) are searched by open loop as described above.
[0109]
As described above, the pitch search method in the present embodiment calculates the correlation value by performing the pitch analysis of the plurality of subframes constituting the frame before the adaptive codebook search of the first subframe. The correlation values of all the subframes can be grasped simultaneously.
[0110]
Then, the correlation value of each subframe is calculated, and a value (referred to as a representative pitch) that is most likely to be a pitch period in each subframe is obtained from the magnitude of the correlation value. To set the lag search range of multiple subframes. In setting the search range, an appropriate temporary pitch (referred to as a temporary pitch) having a small difference at the center of the search range is obtained using the representative pitches and correlation values of a plurality of subframes obtained by pitch analysis.
[0111]
Furthermore, since the lag search section is limited to a designated range before and after the provisional pitch obtained by setting the search range, an efficient search of the adaptive codebook is enabled. At that time, the candidates for the short part of the lag are reduced, and the longer range of the lag is set wider, so that an appropriate search range in which good performance can be obtained can be set. Further, since the lag is searched in the range set by the setting of the search range during the adaptive codebook search, it is possible to perform encoding that can obtain a good decoded sound.
[0112]
Thus, according to the present embodiment, the temporary pitch of the second subframe is also present near the temporary pitch of the first subframe obtained by search range setting section 311. In the first subframe, Since the search range is narrowed down, the lag obtained as a result of the search does not go away from the temporary pitch. Therefore, when searching for the second subframe, the vicinity of the provisional pitch of the second subframe can be searched, and even in a non-stationary frame such as a case where voice starts from the second half of the frame, the first and second subframes are suitable. The lag search becomes possible, and a special effect that has not been obtained conventionally can be obtained.
[0113]
The speech encoding / decoding according to Embodiments 1 and 2 has been described as a speech encoding device / speech decoding device, but these speech encoding / decoding may be configured as software. For example, the voice encoding / decoding program may be stored in a ROM and operated according to instructions from the CPU according to the program. Further, the program, the adaptive codebook, and the stochastic codebook (pulse spreading codebook) are stored in a computer-readable storage medium, and the program, adaptive codebook, and stochastic codebook (pulse spreading code) of this storage medium are stored. (Book) may be recorded in the RAM of the computer and operated according to the program. Even in such a case, the same operations and effects as in the first and second embodiments are exhibited. Furthermore, the program in Embodiments 1 to 3 may be downloaded by a communication terminal and the program may be operated by the communication terminal.
[0114]
The first and second embodiments may be implemented individually or in combination.
[0115]
【The invention's effect】
As described above, the speech coding apparatus of the present invention adjusts the prediction coefficient used for predictive coding according to the state of the previous subframe, so that the prediction coefficient can be controlled according to each code vector. Therefore, it is possible to prevent the more efficient prediction adapted to the local features of the speech and the adverse effects of the prediction in the unsteady part.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a configuration of a wireless communication apparatus provided with a speech encoding apparatus according to the present invention.
FIG. 2 is a block diagram showing a configuration of a speech encoding apparatus according to Embodiment 1 of the present invention.
3 is a block diagram showing a configuration of a gain calculation unit in the speech encoding apparatus shown in FIG.
4 is a block diagram showing the configuration of a parameter encoding unit in the speech encoding apparatus shown in FIG.
FIG. 5 is a block diagram showing a configuration of a speech decoding apparatus that decodes speech data encoded by the speech encoding apparatus according to Embodiment 1 of the present invention;
FIG. 6 is a diagram for explaining adaptive codebook search;
FIG. 7 is a block diagram showing a configuration of a speech encoding apparatus according to Embodiment 2 of the present invention.
[Explanation of symbols]
102,302 LPC analysis section
103,303 Adaptive codebook
104,304 Probabilistic codebook
105,305 Sound generator
106,306 Auditory weight LPC synthesis unit
107,307 Auditory weighting unit
108,308 Gain calculation section
109,309 Parameter encoding unit
310 Pitch analyzer
311 Search range setting section
1091 Parameter calculator
1092 Coding distortion calculation unit
1093 Comparison part
1094 vector codebook
1095 Prediction coefficient storage
1096 Decoded vector storage unit

Claims

LPC synthesis means for obtaining a synthesized sound by filtering the adaptive sound source and the stochastic sound source stored in the adaptive code book and the stochastic code book using the LPC coefficient obtained from the input speech;
Gain for finding gains of adaptive sound source and stochastic sound source, and searching for codes of adaptive sound source and stochastic sound source using coding distortion between the input speech and synthesized sound obtained using the gain Computing means;
Parameter coding means for performing predictive coding of gain using an adaptive sound source and a stochastic sound source corresponding to the obtained code;
Comprising
The parameter encoding means includes
Prediction coefficient storage means for storing a prediction coefficient used for the prediction encoding;
Past code vectors obtained by past quantization performed using a plurality of code vectors prepared in advance using adaptive sound source gain, stochastic sound source gain, and prediction coefficient adjustment coefficient related to stochastic sound source as elements. Decoding vector storage means for storing as the state of the previous subframe,
Prediction coefficient adjustment means for automatically adjusting the prediction coefficient using a prediction coefficient adjustment coefficient according to the state of the previous subframe ;
A speech encoding apparatus comprising:

The prediction coefficient adjusting means is
In the predictive coding, when obtaining a product sum between the gain of the stochastic sound source included in the past code vector and the prediction coefficient, a prediction coefficient adjustment coefficient corresponding to the state of the previous subframe is calculated. The speech coding apparatus according to claim 1, wherein the prediction coefficient is adjusted by multiplying the prediction coefficient.

The prediction coefficient adjusting means is further configured to:
The parameter used for calculating the distortion of the predictive coding is calculated using the past code vector , the prediction coefficient, the synthesized sound, the adaptive sound source, and the stochastic sound source. Item 3. The speech encoding device according to Item 2.

The parameter encoding means includes
A vector sign-book for storing a plurality of code vectors the previously prepared,
Coding distortion calculation means for performing the distortion calculation using the prediction coefficient, the parameter, and the plurality of code vectors;
By using the results with the plurality of code vectors of the distortion calculation obtains the code vector coding distortion in the distortion calculation is minimized, using the code vector the coding distortion is minimized, the decoded vector Comparison means for updating the past code vector stored in the storage means;
The speech encoding apparatus according to claim 3, further comprising:

The prediction coefficient adjusting means is
When the gain of the stochastic sound source included in the past code vector is an extremely large value or an extremely small value, a prediction coefficient adjustment coefficient according to the state of the previous subframe is set so as to reduce the influence thereof. The speech coding apparatus according to claim 1, wherein the prediction coefficient is adjusted using the speech coding apparatus.

A computer-readable recording medium storing a speech encoding program; an adaptive codebook storing a previously synthesized excitation signal; a stochastic codebook storing a plurality of excitation vectors;
The speech encoding program is
A procedure for obtaining a synthesized sound by filtering the adaptive codebook and the stochastic codebook stored in the adaptive codebook and the stochastic codebook using an LPC coefficient obtained from an input voice;
A procedure for obtaining gains of the adaptive sound source and the stochastic sound source;
A procedure for searching for a code of an adaptive sound source and a stochastic sound source using a coding distortion between the input speech obtained using the gain and the synthesized sound;
A procedure for performing predictive coding of gain using an adaptive sound source and a stochastic sound source corresponding to the obtained code;
Including
The procedure for performing the predictive encoding is as follows.
Past code vectors obtained by past quantization executed using a plurality of code vectors prepared in advance using adaptive sound source gain, stochastic sound source gain, and prediction coefficient adjustment coefficient related to stochastic sound source as elements. a step of storing in a memory, automatically adjusts the prediction coefficients using predictive coefficient adjustment coefficients corresponding to the state of the previous sub-frame as the state of the previous sub-frame,
Using the past code vector, the prediction coefficient, the synthesized sound, the adaptive sound source, and the stochastic sound source, a procedure for calculating parameters used for distortion calculation;
A procedure for performing the distortion calculation using the plurality of code vectors prepared in advance stored in a vector codebook, the prediction coefficient, and the parameter;
Using the result of the distortion calculation and the plurality of code vectors prepared in advance , a procedure for obtaining a code vector with the smallest coding distortion in the distortion calculation;
Updating the past code vector stored in the memory using the code vector with the smallest encoding distortion ;
A recording medium comprising: