JP3874851B2

JP3874851B2 - Speech encoding device

Info

Publication number: JP3874851B2
Application number: JP25858596A
Authority: JP
Inventors: 田幸司吉
Original assignee: Panasonic Corp; Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Corp; Panasonic Holdings Corp
Priority date: 1996-09-30
Filing date: 1996-09-30
Publication date: 2007-01-31
Anticipated expiration: 2016-09-30
Also published as: JPH10105197A

Description

【０００１】
【発明の属する技術分野】
本発明は、ディジタル携帯電話等のディジタル移動通信端末やボイスメール等に必須な音声信号の符復号化を行う音声符号化装置に関する。
【０００２】
【従来の技術】
従来、４〜８kbps程度のビットレートの音声符号化装置はＣＥＬＰ(Code Excited Linear Prediction)型が主流であり、"Code Excited Linear Prediction(CELP): High-Quality Speech at Very Low Bit Rate" (by M.R.Schroeder and B.S.Atal, Proc.ICASSP'85,pp.937-940,1985), "Improved Speech Quality and Efficient Vector Quantization in SELP" (by W.B.Kleijn, et.al., Proc.ICASSP'88,pp.155-158,1988) に記載されたものが知られている。図７は従来のＣＥＬＰ音声符号化装置の符号化側の構成を示したもので、入力音声に対してＬＰＣ分析および量子化を行うＬＰＣ分析・量子化器９０１、入力音声に対して聴感重み付けを行う聴感重み付けフィルタ９０２、過去の駆動音源を蓄え適応音源符号帳探索時に使用する適応音源符号帳９０３、適応音源にゲインを乗算するゲイン乗算器９０４、雑音音源ベクトルを格納する雑音音源符号帳９０５、雑音音源にゲインを乗算するゲイン乗算器９０６、適応音源と雑音音源を加算する加算器９０７、加算器９０７により得られた駆動音源に対し、聴感重み付きＬＰＣ合成を行う聴感重み付きＬＰＣ合成フィルタ９０８、聴感重み付けフィルタ９０２により得られる聴感重み付き入力音声から聴感重み付きＬＰＣ合成フィルタ９０８により得られる聴感重み付き合成音声を減算する減算器９０９、減算器９０９により得られた誤差信号の二乗誤差を最小にするような適応音源符号、雑音音源符号およびそれらのゲイン符号を決定する聴感重み付き二乗誤差最小化器９１０、符号化により得られるＬＰＣ符号、適応音源符号、雑音音源符号およびそれらのゲイン符号を多重化する多重化器９１１から構成されている。
【０００３】
【発明が解決しようとする課題】
しかしながら、上記従来の音声符号化装置において、適応音源符号帳は、音声の音源信号に含まれるピッチ予測信号を効率的に生成するために設けられたもので、音声信号の母音等の定常区間には有効に機能するが、音声の立ち上がりや周囲雑音の存在する信号には十分に機能せず、低ビットレートでの高音声品質を実現することが困難であるという問題を有していた。
【０００４】
本発明は、上記従来の問題を解決するもので、低ビットレートでの高音声品質を実現することができる優れた音声符号化装置を提供することを目的とする。
【０００５】
【課題を解決するための手段】
上記目的を達成するために、本発明は、適応音源符号帳に加え、過去の駆動音源により符号帳の更新を行うことにより過去の駆動音源の特徴が保持される拡張適応音源符号帳を新たに備え、適応音源符号帳と拡張適応音源符号帳の出力のいずれかを選択できるようにしたものであり、適応音源が有効に働かない場合に、より過去の駆動音源の特徴が保持された拡張適応音源を用いることにより、低ビットレートにおいて高い音声品質を実現できる。
【０００６】
【発明の実施の形態】
本発明の請求項１に記載の発明は、一定区間毎の音源信号の符号化に用いる音源符号帳として、過去の符号化駆動音源を用いて更新することにより得られる拡張適応音源符号帳を少なくとも備え、前記拡張適応音源符号帳が、符号化音源ベクトル長以上のサイズを有する一つ以上の符号帳ベクトルからなり、前記拡張適応音源符号帳の更新が、前記拡張適応音源符号帳内で過去の駆動音源ベクトルと類似な区間を探索により決定し、その区間にその駆動音源ベクトルを加重加算することにより行われる音声符号化装置であり、より過去の駆動音源の特徴が保持された拡張適応音源を用いることにより、低ビットレートにおいてより高い音声品質を実現することができるという作用を有する。
【０００７】
また、本発明の請求項２、３、４に記載の発明は、ＣＥＬＰ型音声符号化の音源符号帳として、過去の駆動音源を順次蓄える適応音源符号帳、過去の駆動音源により符号帳の更新を行うことにより過去の駆動音源の特徴が保持される請求項１と同様な拡張適応音源符号帳、および固定の音源符号ベクトルを蓄えた固定符号帳を備え、さらに拡張適応音源符号帳を過去の駆動音源により更新する拡張適応音源更新器、適応音源符号帳と拡張適応音源符号帳の出力のいずれかを選択する適応音源／拡張適応音源選択器とを備えたものであり、適応音源または拡張適応音源のいずれかと固定音源との和で音源の符号化を行う構成において、適応音源探索および拡張適応音源探索の結果により、または適応音源探索、拡張適応音源探索およびそれらにより得られた音源各々に対する固定音源探索の結果から、いずれか聴感重み付き二乗誤差の小さい方の音源を選択することにより、固定音源探索の結果も含めたより高い音声品質を実現することができるという作用を有する。
【０００８】
また、本発明の請求項５に記載の発明は、入力音声または聴感重み付き入力音声に対してピッチ分析を行い、ピッチ予測ゲイン等のピッチ周期性度合いを表す値を出力するピッチ分析器を備えたものであり、適応音源／拡張適応音源選択器における適応音源と拡張適応音源の選択時にピッチ周期性度合いの値を参照し、ピッチ周期性度合いがあるしきい値以上に高い場合に、適応音源が無条件で選択されるようにして、入力音声の定常区間で拡張適応音源が選択される場合に生じうる音声品質の劣化を防ぐという作用を有する。
【０００９】
また、本発明の請求項６に記載の発明は、適応音源の出力ラグ値、拡張適応音源信号、過去の駆動音源信号を入力とし、拡張適応音源が選択された場合を含めて、当該区間におけるピッチ周期ラグ値を固定音源へ出力するピッチ周期ラグ算出器を備えたものであり、拡張適応音源選択時にも、固定音源探索時にピッチ周期情報を用いる方法を適用することができ、より音声品質の向上を図ることができるという作用を有する。
【００１０】
以下、本発明の実施の形態について、図１から図６を用いて説明する。
（実施の形態１）
図１は第１の発明のＣＥＬＰ型音声符号化装置の符号化側のブロック図を示したものである。図１において、１０１は入力音声に対してＬＰＣ分析および量子化を行うＬＰＣ分析・量子化器、１０２は入力音声に対して聴感重み付けを行う聴感重み付けフィルタ、１０３は過去の駆動音源の特徴が保持されている拡張適応音源符号帳である。この拡張適応音源符号帳１０３は、符号化音源ベクトル長以上のサイズを有する一つ以上の符号帳ベクトルからなる。ここで、符号帳ベクトルが一つの場合、その符号帳ベクトルを、
EAC n , n=0,1,...,L+N-1
とすると、この符号帳から得られる第k 番目の拡張適応音源ベクトルdk n は、dk n = EAC k+n , n=0,1,...,N-1;k=0,1,...,L-1
と表される。ただし、Ｌは符号帳サイズ、Ｎは音源ベクトル長である。１０４は拡張適応音源符号帳出力に対してゲインを乗じるゲイン乗算器、１０５は過去の駆動音源を蓄え適応音源符号帳探索時に使用する適応音源符号帳、１０６は適応符号音源に対してゲインを乗じるゲイン乗算器、１０７は拡張適応音源符号帳１０３と適応音源符号帳１０５の出力のいずれかを選択する適応音源／拡張適応音源選択器、１０８は固定の音源符号ベクトルを格納した固定符号帳、１０９は固定音源にゲインを乗算するゲイン乗算器、１１０は適応音源／拡張適応音源のいずれかと固定音源を加算する加算器、１１１は加算器１１０により得られた駆動音源に対し聴感重み付きＬＰＣ合成を行う聴感重み付きＬＰＣ合成フィルタ、１１２は聴感重み付けフィルタ１０２により得られる聴感重み付き入力音声から聴感重み付きＬＰＣ合成フィルタ１１１により得られる聴感重み付き合成音声を減算する減算器、減算器１１２により得られた誤差信号の二乗誤差を最小にするような適応音源または拡張適応音源符号と適応音源／拡張適応音源選択情報、雑音音源符号およびそれらのゲイン符号を決定する聴感重み付き二乗誤差最小化器、１１４は符号化後の駆動音源を用いて拡張適応音源符号帳を更新する拡張適応音源符号帳更新器、１１５は符号化により得られるＬＰＣ符号、適応音源または拡張適応音源符号と適応音源／拡張適応音源選択情報、固定音源符号およびそれらのゲイン符号を多重化する多重化器である。
【００１１】
以上のように構成された音声符号化装置について、図１を用いてその動作を説明する。まず従来のＣＥＬＰ符号化装置と同様、入力音声に対してＬＰＣ分析・量子化器１０１によりＬＰＣ分析および量子化を行い、得られた量子化ＬＰＣ係数を用いて聴感重み付けフィルタ１０２により聴覚重み付き入力音声を得る。一方、拡張適応音源符号帳１０３とゲイン乗算器１０４または適応音源符号帳１０５とゲイン乗算器１０６により得られる拡張適応音源または適応音源と、固定符号帳１０８とゲイン乗算器１０９により得られる固定音源とを加算器１１０により加算して得られる駆動音源信号を聴感重み付きＬＰＣ合成フィルタ１１１により合成し、得られた合成信号を聴覚重み付き入力音声から減算器１１２により減算し、得られた誤差信号の二乗誤差を最小にするような適応音源または拡張適応音源符号と適応音源／拡張適応音源選択情報、雑音音源符号およびそれらのゲイン符号を決定する。
【００１２】
ここで、聴感重み付き二乗誤差を最小にするような適応音源または拡張適応音源符号と適応音源／拡張適応音源選択情報、雑音音源符号およびそれらのゲイン符号の決定は、図２に示すような手順で順次決定する。まず、拡張適応音源符号帳探索および適応音源符号帳探索を各々独立に行い（Ｓ１０１、Ｓ１０２）、拡張適応音源符号帳および適応音源符号帳から最適な符号ベクトルを決定する。ここで、拡張適応音源符号帳探索においては、拡張適応音源符号帳内で、聴感重み付き二乗誤差を最小にする符号ベクトルdk n (n=0 〜N-1)を求めるか、または入力音声からＬＰＣ予測によりＬＰＣ残差信号を求めその残差信号との相関を最大にするような拡張適応音源符号ベクトルを求めてもよい。あるいは、残差信号との相関最大化により複数の符号ベクトル候補を求めておき、その中から聴感重み付き二乗誤差を最小にするものを選択してもよい。次に、得られた拡張適応音源と適応音源のうち、聴感重み付き二乗誤差の小さい方の音源を選択する（Ｓ１０３）。そして、選択された音源（拡張適応音源または適応音源）に対して固定音源探索を行う（Ｓ１０４）。最後に、選択された拡張適応音源または適応音源および固定音源に対するゲイン符号をゲイン探索により決定し、駆動音源符号化を終了する（Ｓ１０５）。なお、ゲイン探索は図２の手順に示すような音源符号化の最後にまとめて行う方法以外に、拡張適応音源または適応音源のゲインと固定音源のゲインを逐次的に決定することも可能である。
【００１３】
音源符号化により決定された適応音源または拡張適応音源符号と適応音源／拡張適応音源選択情報、雑音音源符号およびそれらのゲイン符号は、ＬＰＣ分析および量子化を行うＬＰＣ分析・量子化器１０１により得られるＬＰＣ符号と共に、多重化器１１５により多重化され、符号化データとして出力される。また、拡張適応音源符号帳および適応音源符号帳は、符号化された駆動音源信号を用いて更新される。このうち、拡張適応音源符号帳の更新は、拡張適応音源符号帳更新器１１４を用いて以下のように行われる。まず、更新前の拡張適応音源符号帳ベクトルEAC n (n=0〜L+N-1)（符号帳ベクトルが一つの場合）に対して、符号化駆動音源e n (n=0〜N-1)と符号帳ベクトル内で類似する区間を探索する。方法としては、拡張適応音源符号帳ベクトルと駆動音源との相互相関最大化またはゲイン正規化後予測ゲイン最大化等による。それにより得られた符号帳内の類似区間に対して、以下により符号帳を更新する。

ここで、
Ｍ：類似区間先頭サンプル
α：更新係数
【００１４】
次に、本実施の形態におけるＣＥＬＰ型音声符号化装置の復号側構成を図３を参照して説明する。図３において、１５１は符号化データに対して、ＬＰＣ符号、適応音源または拡張適応音源符号と適応音源／拡張適応音源選択情報、雑音音源符号およびそれらのゲイン符号を分離する分離器、１５２は過去の駆動音源の特徴が保持されている拡張適応音源符号帳であり、その内容は符号化側と同様である。１５３は拡張適応音源符号帳出力に対して拡張適応音源ゲイン符号から得られるゲインを乗じるゲイン乗算器、１５４は過去の駆動音源を蓄えた適応音源符号帳、１５５は適応音源に対して適応音源ゲイン符号から得られるゲインを乗じるゲイン乗算器、１５６は拡張適応音源符号帳１５２と適応音源符号帳１５４の出力のいずれかを選択する適応音源／拡張適応音源選択器、１５７は固定の音源符号ベクトルを格納した固定符号帳、１５８は固定音源に固定音源ゲイン符号から得られるゲインを乗算するゲイン乗算器、１５９は適応音源／拡張適応音源のいずれかと固定音源を加算する加算器、１６０はＬＰＣ符号からＬＰＣ係数を復号するＬＰＣ復号器、１６１は加算器１５９により得られた駆動音源に対しＬＰＣ合成を行うＬＰＣ合成フィルタである。
【００１５】
以上のように、構成された音声符号化装置の復号側について、図３を用いてその動作を説明する。まず分離器１５１により符号化データに対して、ＬＰＣ符号、適応音源または拡張適応音源符号と適応音源／拡張適応音源選択情報、固定音源符号およびそれらのゲイン符号を分離する。次に、適応音源／拡張適応音源選択情報により、適応音源または拡張適応音源とそのゲイン符号より適応音源または拡張適応音源いずれか符号化側で選択された音源を生成し、固定音源符号およびそのゲイン符号を用いて生成された固定音源と加算器１５９で加算し、駆動音源を生成する。そして、ＬＰＣ復号器１６０で復号されたＬＰＣ係数を用いてＬＰＣ合成フィルタ１６１で駆動音源に対しＬＰＣ合成を行い復号音声を得る。最後に、拡張適応音源符号帳１５２および適応音源符号帳１５４が、符号化された駆動音源信号を用いて拡張適応音源符号帳更新器１６２により更新される。拡張適応音源符号帳の更新方法は符号化側と同様である。
【００１６】
以上のように、本発明の第１の実施の形態によれば、過去の駆動音源の特徴が保持されている拡張適応音源符号帳を新たに設け、適応音源または拡張適応音源のいずれかと固定音源との和で駆動音源を表現し、適応音源探索および拡張適応音源探索の結果から、いずれか聴感重み付き二乗誤差の小さい方の音源を選択することにより、特に音声の立ち上がりや周囲騒音を含む信号に対し、従来の適応音源のみではうまく表現できなかった音源をより正確に表現でき、より高い音声品質を実現することができる。
【００１７】
なお、図１に示したＣＥＬＰ型の全体構成は一つの典型的な例であり、本発明は他のＣＥＬＰ型の構成にも適用可能である。
【００１８】
（実施の形態２）
次に、本発明の第２の実施の形態について説明する。第２の実施の形態におけるＣＥＬＰ型音声符号化装置の構成は、図１に示す符号化側および図３に示す復号側の構成と同じであるが、音源探索の動作手順が異なる。図４は第２の実施の形態の音源探索の動作手順を示したもので、第１の実施の形態とは異なり、拡張適応音源符号帳探索および適応音源符号帳探索それぞれに対して固定符号帳探索を行った結果に対して拡張適応音源か適応音源の選択をするものである。図４において、まず適応音源符号帳探索を行い（Ｓ２０１）、それにより得られた適応音源に対して固定音源探索を行った後（Ｓ２０２）、適応音源と固定音源のゲイン符号化を行い（Ｓ２０３）、最適な適応音源、固定音源およびそれらのゲイン符号を決定する。次に、拡張適応音源に対しても同様に拡張適応音源符号帳探索、固定音源探索およびゲイン符号化を行い（Ｓ２０４、Ｓ２０５、Ｓ２０６）、最適な拡張適応音源、固定音源およびそれらのゲイン符号を決定する。そして、得られた適応音源とそれに対応する固定音源および拡張適応音源とそれに対応する固定音源の組み合わせのうち、聴感重み付き二乗誤差が小さい方を最終的に選択する（Ｓ２０７）。なお、ゲイン符号化は、図４では固定音源探索後に行うようになっているが、適応音源符号帳探索または拡張適応音源符号帳探索直後にそれらのゲインを単独で符号化する方法も可能である。なお、上記した以外の動作は、拡張適応音源符号帳の更新も含め第１の実施の形態と同様である。
【００１９】
以上のように、本発明の第２の実施の形態によれば、過去の駆動音源の特徴が保持されている拡張適応音源符号帳を新たに設け、適応音源または拡張適応音源のいずれかと固定音源との和で駆動音源を表現し、適応音源および拡張適応音源いずれか最適な音源を選択できるようにし、更に適応音源と拡張適応音源の選択法として、適応音源探索および拡張適応音源探索各々に対して固定音源探索を行った結果に対して聴感重み付き二乗誤差の小さい方の音源を選択するように動作することにより、特に音声の立ち上がりや周囲騒音を含む信号に対し、従来の適応音源のみではうまく表現できなかった音源をより正確に表現でき、より高い音声品質を実現することができる。
【００２０】
（実施の形態３）
図５は本発明の第３の実施の形態におけるＣＥＬＰ型音声符号化装置の符号化側の構成を示したものである。図５において、１０１〜１１５までは第１および第２の実施の形態を示す図１と同一であり、図１と異なるのは、入力音声信号に対してピッチ分析を行い、ピッチ予測ゲイン等のピッチ周期性度合いを表す値を出力するピッチ分析器５０１を備えていることである。
【００２１】
以上のように構成された音声符号化装置について、図５を用いてその動作を説明する。ここでは、第１および第２の実施の形態と異なるピッチ分析器５０１とその結果に基づいた適応音源／拡張適応音源選択器１０７の動作についてのみ説明する。それ以外の動作は第１および第２の実施の形態と同一である。ピッチ分析器５０１では、入力音声を用いてピッチ分析を行い、ピッチ予測ゲインや正規化最大相互相関値等のピッチ周期性度合いを表す値を出力する。なお、ピッチ分析は、図５では入力音声に対して行うようになっているが、聴感重み付けされた入力信号に対して行うことも可能である。そして、適応音源／拡張適応音源選択器１０７における適応音源と拡張適応音源の選択時に、ピッチ周期性度合いの値を参照し、ピッチ周期性度合いがあるしきい値以上に高い場合に、適応音源が無条件で選択されるように動作する。音声の母音定常区間等のピッチ周期性度合いが高い場合は、一般に適応音源の方が拡張適応音源より望ましく、このような区間で拡張適応音源が選択された場合、稀にかえって劣化を招く場合があり、本実施形態のような構成により、それを防ぐことができる。なお、本実施の形態は、第１および第２の実施の形態のいずれに対しても適用することができる。
【００２２】
（実施の形態４）
図６は本発明の第４の実施の形態におけるＣＥＬＰ型音声符号化装置の符号化側の構成を示したものである。図６において、１０１〜１１５までは第１および第２の実施の形態を示す図１と同じであり、図１と異なるのは、適応音源の出力ラグ値、拡張適応音源信号、過去の駆動音源信号を入力とし、拡張適応音源が選択された場合を含めて、当該区間におけるピッチ周期ラグ値を固定音源符号帳へ出力するピッチ周期ラグ算出器６０１を備えたことである。
【００２３】
以上のように構成された音声符号化装置について、図６を用いてその動作を説明する。ここでは、第１および第２の実施の形態と異なるピッチ周期ラグ算出器６０１と固定音源探索の動作についてのみ説明する。それ以外の動作は第１および第２の実施の形態と同一である。ピッチ周期ラグ算出器６０１において、適応音源の出力ラグ値、拡張適応音源信号、過去の駆動音源信号を入力とし、拡張適応音源が選択された場合を含めて、当該区間におけるピッチ周期ラグ値を固定音源符号帳へ出力する。ここで、適応音源が選択された場合は、その探索の結果得られる適応音源ラグをそのまま出力する。一方、拡張適応音源が選択された場合にも、当該区間のピッチ周期ラグに相当する値を算出して出力する。算出方法としては、当該区間の拡張適応音源信号および過去の駆動音源信号から最大残差相関算出等によりピッチ周期ラグを求めるか、または当該区間の前区間で得られた適応音源ラグ（前区間が適応音源が選択されていない場合は、その以前の適応音源ラグ）をそのまま用いる。ただし、最大残差相関値が小さい等、ピッチ周期性が低い場合は、そのような状態である旨の情報を出力する。そして、出力されたピッチ周期ラグは、その情報が必要なタイプの固定音源探索において使用される。これは、例えば雑音音源をラグの周期で音源符号化長の長さだけ繰り返す処理や、パルス音源をやはりラグの周期で繰り返し立てるというような場合である。拡張適応音源は主に音声の立ち上がり等、適応音源が有効に機能しない区間で効果を出すことができるが、母音等の定常区間において拡張適応音源が選択される場合もあり、そのような場合や、音源符号化区間長より短いピッチ周期ラグを持つような音声に対してその立上り区間で拡張適応音源が選択された場合に、ピッチ周期ラグ算出器６０１により得られるピッチ周期ラグを算出し、ピッチ周期ラグを必要とするタイプの固定音源探索に用いることにより、より高品質な符号化を行うことができる。
【００２４】
【発明の効果】
本発明は、上記第１および第２の実施の形態から明らかなように、過去の駆動音源の特徴が保持されている拡張適応音源符号帳を新たに設け、適応音源または拡張適応音源のいずれかと固定音源との和で駆動音源を表現し、適応音源探索および拡張適応音源探索の結果から、いずれか聴感重み付き二乗誤差の小さい方の音源を選択するものであり、より過去の駆動音源の特徴が保持された拡張適応音源を用いることにより、低ビットレートにおいてより高い音声品質を実現することができる。特に音声の立ち上がりや周囲騒音を含む信号に対し、従来の適応音源のみではうまく表現できなかった音源をより正確に表現でき、より高い音声品質を実現することができる。
【００２６】
また本発明は、上記第３の実施の形態から明らかなように、ピッチ分析器により得られるピッチ周期性度合いを表す値を、適応音源／拡張適応音源選択器における適応音源と拡張適応音源の選択時に参照し、ピッチ周期性度合いがあるしきい値以上に高い場合に、適応音源を無条件で選択することにより、音声の母音定常区間等のピッチ周期性度合いが高い区間に拡張適応音源が選択された場合に生じうる劣化を防ぐことができる。
【００２７】
また本発明は、上記第４の実施の形態から明らかなように、適応音源の出力ラグ値、拡張適応音源信号、過去の駆動音源信号を入力とし、拡張適応音源が選択された場合を含めて、当該区間におけるピッチ周期ラグ値を固定音源へ出力するピッチ周期ラグ算出器を備えることにより、拡張適応音源選択時にも、固定音源探索時にピッチ周期情報を用いる方法を適用することができ、より音声品質の向上を図ることができるという効果を有する。
【図面の簡単な説明】
【図１】本発明の実施の形態１、２における音声符号化装置の符号化側のブロック図
【図２】本発明の実施の形態１における音源符号化部の動作手順を示すフロー図
【図３】本発明の実施の形態１、２における音声符号化装置の復号側のブロック図
【図４】本発明の実施の形態２における音源符号化部の動作手順を示すフロー図
【図５】本発明の実施の形態３における音声符号化装置の符号化側のブロック図
【図６】本発明の実施の形態４における音声符号化装置の符号化側のブロック図
【図７】従来の音声符号化装置のブロック図
【符号の説明】
１０１ＬＰＣ分析・量子化器
１０２聴感重み付けフィルタ
１０３拡張適応音源符号帳
１０４ゲイン乗算器
１０５適応音源符号帳
１０６ゲイン乗算器
１０７適応音源／拡張適応音源選択器
１０８固定音源符号帳
１０９ゲイン乗算器
１１０加算器
１１１聴感重み付きＬＰＣ合成フィルタ
１１２減算器
１１３二乗誤差最小化器
１１４拡張適応音源符号帳更新器
１１５多重化器
１５１分離器
１５２拡張適応音源符号帳
１５３ゲイン乗算器
１５４適応音源符号帳
１５５ゲイン乗算器
１５６適応音源／拡張適応音源選択器
１５７固定音源符号帳
１５８ゲイン乗算器
１５９加算器
１６０ＬＰＣ復号器
１６１ＬＰＣ合成フィルタ
５０１ピッチ分析器
６０１ピッチ周期ラグ算出器
９０１ＬＰＣ分析・量子化器
９０２聴感重み付けフィルタ
９０３適応音源符号帳
９０４ゲイン乗算器
９０５固定音源符号帳
９０６ゲイン乗算器
９０７加算器
９０８聴感重み付きＬＰＣ合成フィルタ
９０９減算器
９１０二乗誤差最小化器
９１１多重化器[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a speech coding apparatus that performs code decoding of speech signals essential for digital mobile communication terminals such as digital mobile phones and voice mail.
[0002]
[Prior art]
Conventionally, CELP (Code Excited Linear Prediction) type is mainly used as a speech coding apparatus with a bit rate of about 4 to 8 kbps, and “Code Excited Linear Prediction (CELP): High-Quality Speech at Very Low Bit Rate” (by MR Schroeder and BSAtal, Proc.ICASSP'85, pp.937-940,1985), "Improved Speech Quality and Efficient Vector Quantization in SELP" (by WBKleijn, et.al., Proc.ICASSP'88, pp.155-158, 1988) are known. FIG. 7 shows the configuration of the coding side of a conventional CELP speech coding apparatus. An LPC analyzer / quantizer 901 that performs LPC analysis and quantization on input speech, and perceptual weighting on the input speech. An auditory weighting filter 902 to perform, an adaptive excitation codebook 903 that stores past driving excitations and is used when searching for an adaptive excitation codebook, a gain multiplier 904 that multiplies the adaptive excitation by a gain, a noise excitation codebook 905 that stores a noise excitation vector, A gain multiplier 906 that multiplies the noise source by the gain, an adder 907 that adds the adaptive sound source and the noise source, and an audible weighted LPC synthesis filter 908 that performs audible weighted LPC synthesis on the driving sound source obtained by the adder 907. The audible weighted input speech obtained by the audible weighting filter 902 is used by the audible weighted LPC synthesis filter 908. A subtractor 909 for subtracting the perceptually weighted synthesized speech obtained, and an adaptive excitation code, a noise excitation code, and a gain code for determining those gain codes that minimize the square error of the error signal obtained by the subtractor 909 It comprises a square error minimizer 910, an LPC code obtained by encoding, an adaptive excitation code, a noise excitation code, and a multiplexer 911 that multiplexes those gain codes.
[0003]
[Problems to be solved by the invention]
However, in the above-described conventional speech coding apparatus, the adaptive excitation codebook is provided to efficiently generate the pitch prediction signal included in the speech excitation signal, and is in a stationary section such as a vowel of the speech signal. Functions effectively, but does not function sufficiently for signals with rising voice or ambient noise, and it is difficult to achieve high voice quality at a low bit rate.
[0004]
The present invention solves the above-described conventional problems, and an object thereof is to provide an excellent speech encoding apparatus capable of realizing high speech quality at a low bit rate.
[0005]
[Means for Solving the Problems]
In order to achieve the above object, the present invention newly provides an extended adaptive excitation codebook that retains the characteristics of past driving excitations by updating the codebook with past driving excitations in addition to the adaptive excitation codebook. It is possible to select either the adaptive excitation codebook or the output of the extended adaptive excitation codebook, and when the adaptive excitation does not work effectively, the extended adaptation that preserves the characteristics of the past driving excitation By using a sound source, high voice quality can be realized at a low bit rate.
[0006]
DETAILED DESCRIPTION OF THE INVENTION
The inventions of claim 1 of the present invention, as excitation codebook used in coding the excitation signal for each fixed period, the extended adaptive excitation codebook obtained by updating with a past encoding excitation The extended adaptive excitation codebook comprises at least one codebook vector having a size greater than or equal to the encoded excitation vector length, and the update of the extended adaptive excitation codebook is past in the extended adaptive excitation codebook. This is a speech coding apparatus that is determined by searching for a section similar to the driving excitation vector of the above and performing weighted addition of the driving excitation vector to that section, and is an extended adaptive excitation that retains the characteristics of the past driving excitation By using, there is an effect that higher voice quality can be realized at a low bit rate .
[0007]
Further, inventions of claim 2, 3, 4 of the present invention, as a sound source codebook of CELP type speech coding, stores past excitations sequential adaptive excitation codebook, the codebook in the past excitations An extended adaptive excitation codebook similar to claim 1 in which features of past driving excitations are maintained by updating, and a fixed codebook storing fixed excitation code vectors, and the extended adaptive excitation codebook in the past And an adaptive excitation source or extension adaptive excitation selector that selects either the adaptive excitation codebook or the output of the extended adaptive excitation codebook. in sum configuration for excitation coding in with either the fixed excitation adaptive excitation, as a result of the adaptive excitation search and extended adaptive excitation search, or adaptive excitation search, extended adaptive excitation search and in their From the results of fixed sound source search for each obtained sound source, it is possible to achieve higher voice quality including the result of fixed sound source search by selecting one of the sound sources with smaller auditory weighted square errors. Has an effect.
[0008]
Further, the inventions of claim 5 of the present invention performs a pitch analysis with respect to the input speech or auditory weighting input speech, the pitch analyzer that outputs a value representing the pitch period of the degree of pitch prediction gain, etc. When the adaptive sound source / extended adaptive sound source selector selects the adaptive sound source and the extended adaptive sound source, the value of the pitch periodicity level is referred to, and if the pitch periodicity level is higher than a certain threshold value, it is adapted. The sound source is selected unconditionally, and the speech quality deterioration that may occur when the extended adaptive sound source is selected in the stationary section of the input speech is prevented.
[0009]
Further, the inventions of claim 6 of the present invention, the output lag value of adaptive excitation, extended adaptive excitation signal as input past excitation signals, including a case where the extended adaptive excitation has been selected, the section A pitch period lag calculator that outputs the pitch period lag value to a fixed sound source can be used, and even when an extended adaptive sound source is selected, a method that uses pitch period information when searching for a fixed sound source can be applied. It has the effect | action that improvement of can be aimed at.
[0010]
Hereinafter, embodiments of the present invention will be described with reference to FIGS.
(Embodiment 1)
FIG. 1 is a block diagram on the encoding side of the CELP speech encoding apparatus of the first invention. In FIG. 1, 101 is an LPC analyzer / quantizer that performs LPC analysis and quantization on input speech, 102 is an auditory weighting filter that performs auditory weighting on the input speech, and 103 holds the characteristics of past drive sound sources. This is an extended adaptive excitation codebook. The extended adaptive excitation codebook 103 is composed of one or more codebook vectors having a size equal to or larger than the encoded excitation vector length. Here, when there is one codebook vector, the codebook vector is
EAC n, n = 0,1, ..., L + N-1
Then, the k-th extended adaptive excitation vector dk n obtained from this codebook is dk n = EAC k + n, n = 0,1, ..., N-1; k = 0,1,. .., L-1
It is expressed. Here, L is the codebook size, and N is the excitation vector length. 104 is a gain multiplier that multiplies the gain of the extended adaptive excitation codebook output, 105 is an adaptive excitation codebook that stores past drive excitations and is used when searching for the adaptive excitation codebook, and 106 is a gain multiplied by the adaptive code excitation. A gain multiplier 107 is an adaptive excitation / extended adaptive excitation selector that selects one of the outputs of the extended adaptive excitation codebook 103 and the

adaptive excitation codebook

105, 108 is a fixed codebook that stores a fixed excitation code vector, 109 Is a gain multiplier that multiplies the fixed sound source by gain, 110 is an adder that adds either the adaptive sound source or the extended adaptive sound source and the fixed sound source, and 111 is an LPC synthesis with auditory weighting for the driving sound source obtained by the adder 110. Auditory weighted LPC synthesis filter to be performed, 112 is an auditory weighted input speech obtained from the auditory weighted input filter 102 A subtractor for subtracting the perceptually weighted synthesized speech obtained by the PC synthesis filter 111, an adaptive excitation or an extended adaptive excitation code and an adaptive excitation / extended adaptive excitation that minimizes the square error of the error signal obtained by the subtractor 112 Auditory weighted square error minimizer that determines selection information, noise excitation codes and their gain codes, 114 is an extended adaptive excitation codebook updater that updates the extended adaptive excitation codebook using the encoded drive excitation, Reference numeral 115 denotes a multiplexer that multiplexes an LPC code obtained by encoding, an adaptive excitation or an extended adaptive excitation code, adaptive excitation / extended adaptive excitation selection information, a fixed excitation code, and gain codes thereof.
[0011]
The operation of the speech coding apparatus configured as described above will be described with reference to FIG. First, as in the conventional CELP coding apparatus, LPC analysis and quantization are performed on the input speech by the LPC analyzer / quantizer 101, and the auditory weighting filter 102 inputs the auditory weight using the obtained quantized LPC coefficients. Get voice. On the other hand, the extended adaptive excitation or adaptive excitation obtained by the extended adaptive excitation codebook 103 and the gain multiplier 104 or the adaptive excitation codebook 105 and the gain multiplier 106, and the fixed excitation obtained by the fixed codebook 108 and the gain multiplier 109, Are added by the adder 110, and the driving sound source signal obtained by the adder 110 is synthesized by the auditory weighted LPC synthesis filter 111, and the obtained synthesized signal is subtracted from the auditory weighted input speech by the subtractor 112, and the error signal obtained An adaptive excitation or extended adaptive excitation code and adaptive excitation / extended adaptive excitation selection information, a noise excitation code, and their gain codes that minimize the square error are determined.
[0012]
Here, the determination of the adaptive excitation or the extended adaptive excitation code and the adaptive / extended adaptive excitation selection information, the noise excitation code and their gain codes that minimize the perceptually weighted square error is performed as shown in FIG. Determine sequentially. First, an extended adaptive excitation codebook search and an adaptive excitation codebook search are performed independently (S101, S102), and an optimal code vector is determined from the extended adaptive excitation codebook and the adaptive excitation codebook. Here, in the extended adaptive excitation codebook search, a code vector dkn (n = 0 to N-1) that minimizes the audible weighted square error is obtained in the extended adaptive excitation codebook, or from the input speech. An extended adaptive excitation code vector that obtains an LPC residual signal by LPC prediction and maximizes the correlation with the residual signal may be obtained. Alternatively, a plurality of code vector candidates may be obtained by maximizing correlation with the residual signal, and the one that minimizes the audible weighted square error may be selected. Next, a sound source with a smaller audible weighted square error is selected from the obtained extended adaptive sound sources and adaptive sound sources (S103). Then, a fixed sound source search is performed on the selected sound source (extended adaptive sound source or adaptive sound source) (S104). Finally, a gain code for the selected extended adaptive sound source or adaptive sound source and fixed sound source is determined by gain search, and driving sound source coding is terminated (S105). In addition to the method in which the gain search is performed collectively at the end of the excitation coding as shown in the procedure of FIG. 2, it is also possible to sequentially determine the gain of the extended adaptive excitation or the adaptive excitation and the gain of the fixed excitation. .
[0013]
The adaptive excitation or the extended adaptive excitation code determined by the excitation coding and the adaptive excitation / extended adaptive excitation selection information, the noise excitation code and their gain codes are obtained by the LPC analyzer / quantizer 101 that performs LPC analysis and quantization. The LPC code is multiplexed by the multiplexer 115 and output as encoded data. The extended adaptive excitation codebook and the adaptive excitation codebook are updated using the encoded driving excitation signal. Among these, the update of the extended adaptive excitation codebook is performed using the extended adaptive excitation codebook updater 114 as follows. First, with respect to the extended adaptive excitation codebook vector EAC n (n = 0 to L + N-1) before update (when there is one codebook vector), the encoding drive excitation en (n = 0 to N-1) ) And a similar section in the codebook vector. As a method, the cross-correlation between the extended adaptive excitation codebook vector and the driving excitation is maximized, or the gain is normalized after the gain is normalized. For the similar section in the codebook obtained as a result, the codebook is updated as follows.

here,
M: Similar section head sample α: Update coefficient
Next, the decoding side configuration of the CELP speech coding apparatus in the present embodiment will be described with reference to FIG. In FIG. 3, reference numeral 151 denotes an LPC code, adaptive excitation or extended adaptive excitation code and adaptive excitation / extended adaptive excitation selection information, noise excitation code and their gain codes, and a separator 152 for past encoded data. This is an extended adaptive excitation codebook in which the characteristics of the driving excitation are retained, and the contents thereof are the same as those on the encoding side. 153 is a gain multiplier that multiplies the gain obtained from the extended adaptive excitation gain code by the extended adaptive excitation codebook output, 154 is an adaptive excitation codebook that stores past driving excitations, and 155 is an adaptive excitation gain for the adaptive excitation. A gain multiplier that multiplies the gain obtained from the code, 156 is an adaptive excitation / extended adaptive excitation selector that selects one of the outputs of the extended adaptive excitation codebook 152 and the

adaptive excitation codebook

154, and 157 is a fixed excitation code vector. The stored fixed codebook, 158 is a gain multiplier that multiplies the fixed sound source by the gain obtained from the fixed sound source gain code, 159 is an adder that adds either the adaptive sound source or the extended adaptive sound source and the fixed sound source, and 160 is the LPC code LPC decoder for decoding LPC coefficients, 161 is LPC synthesis for performing LPC synthesis on the driving sound source obtained by the adder 159 It is a filter.
[0015]
The operation of the decoding side of the speech encoding apparatus configured as described above will be described with reference to FIG. First, the separator 151 separates LPC code, adaptive excitation or extended adaptive excitation code, adaptive excitation / extended adaptive excitation selection information, fixed excitation code, and their gain codes from the encoded data. Next, based on the adaptive sound source / extended adaptive sound source selection information, a sound source selected on the encoding side of the adaptive sound source or the extended adaptive sound source from the adaptive sound source or the extended adaptive sound source and its gain code is generated, and the fixed sound source code and its gain are generated. The fixed sound source generated by using the code is added by the adder 159 to generate a driving sound source. Then, using the LPC coefficients decoded by the LPC decoder 160, the LPC synthesis filter 161 performs LPC synthesis on the driving sound source to obtain decoded speech. Finally, the extended adaptive excitation codebook 152 and the adaptive excitation codebook 154 are updated by the extended adaptive excitation codebook updater 162 using the encoded drive excitation signal. The method for updating the extended adaptive excitation codebook is the same as that on the encoding side.
[0016]
As described above, according to the first embodiment of the present invention, the extended adaptive excitation codebook in which the characteristics of the past driving excitation are retained is newly provided, and either the adaptive excitation or the extended adaptive excitation and the fixed excitation. By expressing the driving sound source with the sum of and the result of adaptive sound source search and extended adaptive sound source search, one of the sound sources with a smaller audible weighted square error is selected, and in particular the signal that includes the rise of the sound and ambient noise On the other hand, a sound source that could not be expressed well only by the conventional adaptive sound source can be expressed more accurately, and higher voice quality can be realized.
[0017]
Note that the entire CELP configuration shown in FIG. 1 is one typical example, and the present invention is applicable to other CELP configurations.
[0018]
(Embodiment 2)
Next, a second embodiment of the present invention will be described. The configuration of the CELP speech coding apparatus in the second embodiment is the same as that of the coding side shown in FIG. 1 and the decoding side shown in FIG. 3, but the operation procedure of the sound source search is different. FIG. 4 shows the operation procedure of the excitation search of the second embodiment. Unlike the first embodiment, the fixed codebook is used for each of the extended adaptive excitation codebook search and the adaptive excitation codebook search. The extended adaptive sound source or the adaptive sound source is selected for the result of the search. In FIG. 4, first, an adaptive excitation codebook search is performed (S201), a fixed excitation search is performed on the adaptive excitation obtained thereby (S202), and then gain coding of the adaptive excitation and the fixed excitation is performed (S203). ) To determine the optimum adaptive sound source, fixed sound source and their gain codes. Next, extended adaptive excitation codebook search, fixed excitation search, and gain encoding are similarly performed on the extended adaptive excitation (S204, S205, S206), and the optimal extended adaptive excitation, fixed excitation, and their gain codes are obtained. decide. Then, of the obtained combinations of the adaptive sound source, the corresponding fixed sound source and the extended adaptive sound source, and the corresponding fixed sound source, the one with the smaller audible weighted square error is finally selected (S207). Note that gain coding is performed after fixed excitation search in FIG. 4, but it is also possible to encode those gains independently immediately after adaptive excitation codebook search or extended adaptive excitation codebook search. . The operations other than those described above are the same as those in the first embodiment, including the update of the extended adaptive excitation codebook.
[0019]
As described above, according to the second embodiment of the present invention, an extended adaptive excitation codebook that retains the characteristics of past drive excitations is newly provided, and either an adaptive excitation or an extended adaptive excitation and a fixed excitation. The driving sound source is expressed as the sum of and the optimal sound source, either the adaptive sound source or the extended adaptive sound source, can be selected. By selecting the sound source with the smaller auditory weighted square error from the fixed sound source search results, the conventional adaptive sound source alone can be used, especially for signals that contain speech rises and ambient noise. Sound sources that could not be expressed well can be expressed more accurately, and higher voice quality can be realized.
[0020]
(Embodiment 3)
FIG. 5 shows the configuration of the coding side of the CELP speech coding apparatus according to the third embodiment of the present invention. In FIG. 5, 101 to 115 are the same as those in FIG. 1 showing the first and second embodiments. The difference from FIG. A pitch analyzer 501 that outputs a value representing the degree of pitch periodicity is provided.
[0021]
The operation of the speech coding apparatus configured as described above will be described with reference to FIG. Here, only the operations of the pitch analyzer 501 different from the first and second embodiments and the adaptive sound source / extended adaptive sound source selector 107 based on the result will be described. Other operations are the same as those in the first and second embodiments. The pitch analyzer 501 performs pitch analysis using the input voice and outputs a value representing the degree of pitch periodicity, such as a pitch prediction gain and a normalized maximum cross-correlation value. Although the pitch analysis is performed on the input sound in FIG. 5, it can also be performed on the input signal weighted perceptually. When the adaptive sound source / extended adaptive sound source selector 107 selects the adaptive sound source and the extended adaptive sound source, the value of the pitch periodicity level is referred to, and if the pitch periodicity level is higher than a certain threshold value, the adaptive sound source is Operates to be selected unconditionally. If the degree of pitch periodicity is high, such as in a vowel stationary section of speech, the adaptive sound source is generally preferable to the extended adaptive sound source, and if the extended adaptive sound source is selected in such a section, it may rarely cause deterioration. There can be prevented by the configuration of the present embodiment. This embodiment can be applied to both the first and second embodiments.
[0022]
(Embodiment 4)
FIG. 6 shows the configuration of the coding side of the CELP speech coding apparatus according to the fourth embodiment of the present invention. In FIG. 6, 101 to 115 are the same as those in FIG. 1 showing the first and second embodiments, and are different from FIG. 1 in that the output lag value of the adaptive sound source, the extended adaptive sound source signal, the past drive sound source This includes a pitch period lag calculator 601 that receives a signal and outputs a pitch period lag value in the section to a fixed excitation codebook, including the case where an extended adaptive excitation is selected.
[0023]
The operation of the speech coding apparatus configured as described above will be described with reference to FIG. Here, only the operations of pitch period lag calculator 601 and fixed sound source search, which are different from those of the first and second embodiments, will be described. Other operations are the same as those in the first and second embodiments. In the pitch period lag calculator 601, the output lag value of the adaptive sound source, the extended adaptive sound source signal, and the past driving sound source signal are input, and the pitch period lag value in the corresponding section is fixed including the case where the extended adaptive sound source is selected. Output to excitation codebook. Here, when an adaptive sound source is selected, the adaptive sound source lag obtained as a result of the search is output as it is. On the other hand, even when the extended adaptive sound source is selected, a value corresponding to the pitch period lag of the section is calculated and output. As a calculation method, the pitch period lag is obtained by calculating the maximum residual correlation from the extended adaptive sound source signal and the past driving sound source signal of the section, or the adaptive sound source lag (previous section is obtained in the previous section of the section). If no adaptive sound source is selected, the previous adaptive sound source lag) is used as it is. However, when the pitch periodicity is low, such as when the maximum residual correlation value is small, information indicating that such a state is output. The output pitch period lag is used in a type of fixed sound source search that requires that information. This is the case where, for example, a noise source is repeated for the length of the excitation coding length in the lag cycle, or a pulse source is repeated in the lag cycle. The extended adaptive sound source can produce an effect mainly in the section where the adaptive sound source does not function effectively, such as the rise of speech, but the extended adaptive sound source may be selected in the steady section such as a vowel. When the extended adaptive excitation is selected in the rising section for speech having a pitch period lag shorter than the excitation coding section length, the pitch period lag obtained by the pitch period lag calculator 601 is calculated, and the pitch By using this for a fixed sound source search of a type that requires a periodic lag, higher quality encoding can be performed.
[0024]
【The invention's effect】
As is clear from the first and second embodiments, the present invention newly provides an extended adaptive excitation codebook in which the characteristics of past drive excitations are retained, and is provided with either an adaptive excitation or an extended adaptive excitation. represent the excitation by the sum of the fixed excitation, adapted from the results of the sound source searching and extended adaptive excitation search, which selects the smaller of the sound source of any auditory weighting square error, more characteristic of past excitations By using the extended adaptive sound source in which is maintained, higher voice quality can be realized at a low bit rate. In particular, it is possible to more accurately represent a sound source that could not be expressed well only by a conventional adaptive sound source for a signal including the rise of sound and ambient noise, and higher sound quality can be realized.
[0026]
Further, as is clear from the third embodiment, the present invention selects a value representing the degree of pitch periodicity obtained by the pitch analyzer from the selection of the adaptive sound source and the extended adaptive sound source in the adaptive sound source / expanded adaptive sound source selector. When the degree of pitch periodicity is higher than a certain threshold, sometimes the adaptive adaptive sound source is selected unconditionally, so that the extended adaptive sound source is selected for a section with a high degree of pitch periodicity, such as a steady vowel regular section. It is possible to prevent the deterioration that may occur if it is performed.
[0027]
Further, as is apparent from the fourth embodiment, the present invention includes the case where the extended adaptive sound source is selected with the output lag value of the adaptive sound source, the extended adaptive sound source signal, and the past drive sound source signal as inputs. By providing a pitch period lag calculator that outputs the pitch period lag value in the section to a fixed sound source, it is possible to apply a method using pitch period information when searching for a fixed sound source even when an extended adaptive sound source is selected. It has the effect that quality can be improved.
[Brief description of the drawings]
FIG. 1 is a block diagram on the coding side of a speech coding apparatus according to Embodiments 1 and 2 of the present invention. FIG. 2 is a flowchart showing an operation procedure of a sound source coding unit according to Embodiment 1 of the present invention. 3 is a block diagram on the decoding side of the speech coding apparatus according to Embodiments 1 and 2 of the present invention. FIG. 4 is a flowchart showing an operation procedure of a sound source coding unit according to Embodiment 2 of the present invention. FIG. 6 is a block diagram on the coding side of the speech coding apparatus according to Embodiment 3 of the present invention. FIG. 6 is a block diagram on the coding side of the speech coding apparatus according to Embodiment 4 of the present invention. Device block diagram [Explanation of symbols]
101 LPC analyzer / quantizer 102 perceptual weighting filter 103 extended adaptive excitation codebook 104 gain multiplier 105 adaptive excitation codebook 106 gain multiplier 107 adaptive excitation / extended adaptive excitation selector 108 fixed excitation codebook 109 gain multiplier 110 addition 111 LPC synthesis filter with auditory weight 112 Subtractor 113 Square error minimizer 114 Extended adaptive excitation codebook updater 115 Multiplexer 151 Separator 152 Extended adaptive excitation codebook 153 Gain multiplier 154 Adaptive excitation codebook 155 Gain multiplication 156 Adaptive excitation / extended adaptive excitation selector 157 Fixed excitation codebook 158 Gain multiplier 159 Adder 160 LPC decoder 161 LPC synthesis filter 501 Pitch analyzer 601 Pitch period lag calculator 901 LPC analyzer / quantizer 902 Auditory weighting Filter 9 3 adaptive excitation codebook 904 a gain multiplier 905 fixed excitation codebook 906 gain multiplier 907 adder 908 with perceptual weighting LPC synthesis filter 909 a subtractor 910 square error minimizer 911 multiplexer

Claims

At least an extended adaptive excitation codebook obtained by updating using a past coded driving excitation as a excitation codebook used for encoding a excitation signal for each fixed interval , the extended adaptive excitation codebook is encoded It consists of one or more codebook vectors having a size equal to or greater than the excitation vector length, and the update of the extended adaptive excitation codebook is determined by searching a section similar to a past drive excitation vector in the extended adaptive excitation codebook A speech encoding apparatus which is performed by weighted addition of the driving excitation vector to the section .

As an excitation codebook for CELP speech coding, an adaptive excitation codebook for sequentially storing past driving excitations and an extended adaptive excitation codebook obtained by updating using past encoded driving excitations , the extension consists of one or more codebook vectors adaptive excitation codebook has a coded excitation vector length or size, updating of the extended adaptive code book, in the extended adaptive code book in the past excitation vector similarity interval determined by the search, such a extended adaptive code book that weighted addition of the excitation vector in that period, a fixed codebook stored sound source code vector of the fixed, pre-Symbol extended adaptive excitation codebook past driving an extended adaptive excitation codebook updating unit for updating by the sound source, the adaptive excitation / extended adaptive excitation selector for selecting either the output of said adaptive excitation codebook and said extended adaptive code book, Speech coding apparatus provided with an adder for adding and the fixed excitation either応音source or extended adaptive excitation.

The speech coding apparatus according to claim 2, wherein the adaptive sound source / extended adaptive sound source selector selects a sound source having a smaller perceptual weighting square error based on a result of the adaptive sound source search and the extended adaptive sound source search.

The adaptive sound source / extended adaptive sound source selector selects one of the sound sources having the smaller auditory weighting square error based on the results of the adaptive sound source search, the extended adaptive sound source search, and the fixed sound source search for each of the sound sources obtained thereby. Item 3. The speech encoding device according to Item 2 .

Performs pitch analysis on the input speech or perceptual weighting input speech, comprising a pitch analyzer for outputting a value representative of the pitch periodicity degree such as pitch prediction gain, adaptive prior Kiteki応音source / extended adaptive excitation selector sound source and the reference to the value of the pitch periodicity degree when extended adaptive sound source selection, is higher than a certain threshold pitch periodicity degree, the adaptive sound source according to claim 3 or 4 wherein is selected unconditionally Speech encoding device.

Calculates the pitch lag value for the relevant interval, including the case where the extended adaptive excitation is selected, and outputs it to the fixed excitation codebook, including the output lag value of the adaptive excitation, the extended adaptive excitation signal, and the past driving excitation signal. The speech coding apparatus according to claim 3 or 4, further comprising a pitch period lag calculator.

A magnetic disk, a magneto-optical disk, or a ROM cartridge of the recording medium the voice coding apparatus recording a program for implementing by software according to any of claims 1 6.