JP3579276B2

JP3579276B2 - Audio encoding / decoding method

Info

Publication number: JP3579276B2
Application number: JP36783698A
Authority: JP
Inventors: 皇天田; 公生三関
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1997-12-24
Filing date: 1998-12-24
Publication date: 2004-10-20
Anticipated expiration: 2018-12-24
Also published as: JPH11259098A

Description

【０００１】
【発明の属する技術分野】
本発明は、ディジタル電話、ボイスメモなどに用いられる低符号化レートの音声符号化／復号化方法に関する。
【０００２】
【従来の技術】
近年、携帯電話やインターネットなどで音声や楽音を少ない情報量に圧縮して伝送、蓄積するための符号化技術として、ＣＥＬＰ方式（ＣｏｄｅＥｘｃｉｔｅｄＬｉｎｅａｒＰｒｅｄｉｃｔｉｏｎ（Ｍ．Ｒ．ＳｃｈｒｏｅｄｅｒａｎｄＢ．Ｓ．Ａｔａｌ， ”ＣｏｄｅＥｘｃｉｔｅｄＬｉｎｅａｒＰｒｅｄｉｃｔｉｏｎ（ＣＥＬＰ）：ＨｉｇｈＱｕａｌｉｔｙＳｐｅｅｃｈａｔＶｅｒｙＬｏｗＢｉｔＲａｔｅｓ，” Ｐｒｏｃ．ＩＣＡＳＳＰ，ｐｐ．９３７−９４０，１９８５（文献１）およびＷ．Ｓ．Ｋｌｅｉｊｉｎ，Ｄ．Ｊ．Ｋｒａｓｉｎｓｋｉｅｔａｌ． ”ＩｍｐｒｏｖｅｄＳｐｅｅｃｈＱｕａｌｉｔｙａｎｄＥｆｆｉｃｉｅｎｔＶｅｃｔｏｒＱｕａｎｔｉｚａｔｉｏｎｉｎＳＥＬＰ，” Ｐｒｏｃ．ＩＣＡＳＳＰ，ｐｐ．１５５−１５８，１９８８（文献２））がよく用いられている。
【０００３】
ＣＥＬＰは線形予測分析に基づく符号化方式であり、入力音声信号は線形予測分析によって音韻情報を表す線形予測係数と音の高さ等を表す予測残差信号に分けられる。線形予測係数を基に合成フィルタと呼ばれる再帰型のディジタルフィルタが構成され、この合成フィルタに予測残差信号が駆動信号として入力されることで、元の入力音声信号に復元できる。
【０００４】
低レートで符号化するためには、合成フィルタの特性を表す合成フィルタ情報である線形予測係数と、合成フィルタを駆動する駆動信号である予測残差信号をより少ない情報量で符号化する必要がある。ＣＥＬＰ方式では、ピッチベクトルと雑音ベクトルの２種類の信号に適当なゲインを乗じた後、足し合わせることによって、予測残差信号を符号化した信号が駆動信号として生成される。ピッチベクトルの生成方法は例えば文献２に述べられている。
【０００５】
文献２の方法の他に音声立上り部（ｏｎｓｅｔ）で固定の符号ベクトルを用いる方法なども提案されているが本発明ではこれらをまとめてピッチベクトルと呼ぶことにする。
雑音ベクトルは通常、多数の候補を雑音符号帳に格納しておき、この中から最適なものを選ぶことによって生成される。雑音ベクトルの探索方法として、全ての雑音ベクトルをピッチベクトルと足し合わせた後に合成フィルタに通して合成音声信号を生成し、この合成音声信号の入力音声信号に対する歪みを評価し、最も歪みの小さい合成音声信号を生成する雑音ベクトルを選ぶという方法がとられる。従って、如何に効率良く雑音ベクトルを雑音符号帳に格納しておくかがＣＥＬＰ方式の重要なポイントになる。
【０００６】
代数構造符号帳（ＡｌｇｅｂｒａｉｃＣｏｄｅｂｏｏｋ）（Ｊ−Ｐ．Ａｄｏｕｌｅｔａｌ， “ ＦａｓｔＣＥＬＰＣｏｄｉｎｇｂａｓｅｄｏｎａｌｇｅｂｒａｉｃｃｏｄｅｓ”，Ｐｒｏｃ．ＩＣＡＳＳＰ’８７，ｐｐ．１９５７−１９６０（文献３））は、雑音ベクトルをパルスの有無と極性（＋，−）だけで表す簡単な構造である。代数構造符号帳は複数の雑音ベクトルを格納した雑音符号帳を用いた方式に比べ、コードベクトルを格納する必要がなく、また計算量が少ないなどの特徴を持つ。音質の面でも従来の方式に比べて遜色がないため、近年、様々な標準方式に用いられている。
【０００７】
【発明が解決しようとする課題】
しかしながら、代数構造符号帳は符号化のビットレート（符号化レート）が下がるに従い、音質の劣化が目立つようになる。その理由の一つとして、パルスの位置情報の不足が挙げられる。すなわち、代数構造符号帳ではパルスの位置情報を代数的に単純化しているため、上述した利点はあるが、低符号化レートではパルスを立てる必要の無い箇所に位置候補が存在し、必要な個所に存在しないことがあるため、効率が悪いばかりでなく、音声の品質が劣化してしまう。
【０００８】
代数構造符号帳を用いた場合に音質が劣化するもう一つの理由として、パルス数の不足が挙げられる。パルス数が不足すると、復号音声に「プチプチ」という雑音が目立つようになる。これは駆動信号がパルス列から生成されているためであり、パルス数の減少とともにパルスの有無が聴覚的に知覚されやすくなるからである。音質の向上のためには、このプチプチ感を軽減させる必要がある。
【０００９】
上述したように、従来の代数構造符号帳は構造が簡単であり、計算量が少ないという利点を有する反面、低符号化レートになると合成フィルタの駆動信号を構成するパルス列の位置情報およびパルス数の不足により復号音声の音質が低下するという問題点があった。
【００１０】
本発明は、低符号化レートでも良好な音質が得られる音声符号化／復号化方法を提供することを目的とする。
【００１１】
【課題を解決するための手段】
本発明は、音声信号を少なくとも合成フィルタの特性を表す情報を生成するステップと、該合成フィルタを駆動するための信号であり、前記音声信号の性質に応じて適応的に変化するパルス位置候補から選ばれた所定の数のパルス位置にパルスを配置することで生成されたパルス列を含む駆動信号を生成するステップとでなる音声符号化方法を提供する。
【００１２】
本発明は、音声信号の性質に応じて適応的に変化するパルス位置候補から選ばれた所定の数のパルス位置にパルスを配置することで生成されたパルス列を含む駆動信号を合成フィルタに入力して音声信号を復号化する音声復号化方法を提供する。
【００１３】
本発明に係る音声符号化／復号化方法では、合成フィルタを駆動する駆動信号は音声信号の性質に応じて適応的に変化するパルス位置候補から選ばれた所定の数のパルス位置にパルスを配置することで生成されたパルス列を含んでいる。パルス位置候補は、より具体的には音声信号のパワ（ｐｏｗｅｒ）の大きい所ほど多くの候補が存在するように配置される。
【００１４】
また、駆動信号は音声信号の性質に応じて適応的に変化するパルス位置候補全てにパルスを配置し、各パルスの振幅を所定の手段で最適化することで生成されたパルス列を含んで構成することもできる。この場合、パルス位置候補はより具体的には、音声信号のパワの大きい所ほど多くの候補が存在するように配置される。
【００１５】
さらに、駆動信号は音声信号の性質に応じて適応的に変化する第１のパルス位置候補から選ばれた所定の数のパルス位置にパルスを配置することで生成されたパルス列か、又は、第１のパルス位置候補として用いられなかった位置の一部または全部からなる第２のパルス位置候補から選ばれた所定の数のパルス位置にパルスを配置することで生成されたパルス列のいずれかを用いて生成することもできる。この場合、第１のパルス位置候補は、より具体的には、音声信号のパワの大きい所ほど多くの候補が存在するように配置される。
【００１６】
また、駆動信号がピッチベクトルおよび雑音ベクトルからなる場合には、雑音ベクトルがピッチベクトルの形状に応じて変化するパルス位置候補から選ばれた所定の数のパルス位置にパルスを配置することで生成される。この場合、パルス位置候補はより具体的には、ピッチベクトルのパワの大きい所ほど多くの候補が存在するように配置される。
【００１７】
また、雑音ベクトルがピッチベクトルの形状から求められた位置候補密度関数に基づき設定された位置候補から選ばれた所定の数のパルス位置にパルスを配置することで生成されたパルス列を用いて構成とすることもできる。この場合、パルス位置候補はより具体的には、位置候補密度関数の値の大きい所ほど多くの候補が存在するように配置され、位置候補密度関数はピッチベクトルのパワとパルスが配置される確率を関連付ける予め求められた関数である。
【００１８】
さらに、雑音ベクトルにピッチ周期強調フィルタなどの補正手段を用いる場合には、ピツチベクトルにこの逆特性に基づく処理を行った逆補正ピッチベクトルの形状に応じて変化するパルス位置候補から選ばれた所定の数のパルス位置にパルスを配置することで生成される。この場合、パルス位置候補はより具体的には、逆補正ピッチベクトルのパワの大きい所ほど多くの候補が存在するように配置される。
【００１９】
このようにパルス位置候補を音声信号のパワー分布などの性質に応じて適応的に変化させることにより、低符号化レート化によってパルス位置やパルス数が削減された代数構造符号帳を用いた場合でも符号化効率が向上し、復号音声の音質を維持しつつ低符号化レート化を図ることができる。また、パルス位置候補の作成にピッチベクトルを用いることで、付加情報を必要とせずにパルス位置候補の適応化が可能となる。
【００２０】
本発明に係る他の音声符号化／復号化方法では、駆動信号がピッチベクトルおよび雑音ベクトルからなる場合、ピッチベクトルの形状を基に決められた特性を持つパルス整形手段によって整形されたパルス列を含む駆動信号が生成される。
【００２１】
このような構成によって、パルス数の減少による復号音声に含まれるパルス状の雑音が軽減され、低符号化レート化によってパルス位置やパルス数が削減された場合でも、復号音声の音質を維持しつつ低符号化レート化が可能となる。
【００２２】
さらに、本発明に係る音声符号化／復号化方法においては、音声信号の性質に応じて適応的に変化するパルス位置候補から選ばれた所定の数のパルス位置にパルスを配置することで生成されたパルス列を含む駆動信号を生成し、かつこのパルス列をピッチベクトルの形状を基に決められた特性を持つパルス整形手段によって整形してもよい。
【００２３】
【発明の実施の形態】
図１に、第１の実施形態に係る音声符号化方法を適用した音声符号化システムが示される。この音声符号化システムは、入力端子１０１，１０６と、ＬＰＣ分析部１１０と、ＬＰＣ量子化部１１１と、ＬＰＣ合成部１２０と、聴覚重み付け部１３０と、適応符号帳１４１と、パルス位置候補探索部１４２と、適応代数構造符号帳１４３と、符号選択部１５０と、ピッチ周期強調部１６０と、利得乗算部１０２，１０３および加算部１０４，１０５から構成される。
【００２４】
入力端子１０１には、符号化すべき入力音声信号が１フレーム分の長さの単位で入力され、これに同期してＬＰＣ分析部１１０で線形予測分析が行われることにより、声道特性に相当する線形予測係数（ＬＰＣ係数）が求められる。ＬＰＣ係数はＬＰＣ量子化部１１１で量子化され、この量子化値がＬＰＣ合成部１２０にＬＰＣ合成部１２０の特性を表す合成フィルタ情報として入力されると共に、量子化値を指し示すインデックスＡが符号化結果として図示しない多重化部へ出力される。
【００２５】
適応符号帳１４１には、過去にＬＰＣ合成部１２０に入力された駆動信号が格納されている。ＬＰＣ合成部１２０の入力となる駆動信号は、線形予測分析における予測残差信号を量子化した信号であり、音の高低の情報などを含む声帯信号に相当する。適応符号帳１４１は過去の駆動信号からピッチ周期に相当する長さの波形を切り出し、これを繰り返すことでピッチベクトルを生成する。ピッチベクトルは通常、フレームを幾つかに分割したサブフレーム単位で求められる。
【００２６】
パルス位置候補探索部１４２では、適応符号帳１４１で求められたピッチベクトルを基に、サブフレーム内のどの位置にパルス位置候補を設定するかを計算で求め、その結果を適応代数構造符号帳１４３に出力する。
【００２７】
適応代数構造符号帳１４３は、パルス位置候補探索部１４２から入力されたパルス位置候補の中から、ピッチベクトルの影響を差し引いた入力音声信号に対する歪みが聴覚重みの下で最小となるように、所定の本数分のパルス位置とその符号を探索する。
【００２８】
適応代数構造符号帳１４３の出力であるパルス列は、必要に応じてピッチ周期強調部１６０によってピッチ単位で周期化される。ピッチ周期強調部１６０では、入力端子１０６から適応符号帳１４３の探索で求められたピッチ周期の情報Ｌが入力され、パルス列にピッチ周期の周期性が与えられる。
【００２９】
適応符号帳１４１から出力されるピッチベクトルおよび適応代数構造符号帳１４３から出力され、かつ必要に応じてピッチ周期強調部１６０で周期性が与えられたパルス列は、利得乗算部１０２，１０３によりピッチベクトルに対する利得Ｇ０および雑音ベクトルに対する利得Ｇ１がそれぞれ乗じられた後、加算部１０４で加え合わせられ、ＬＰＣ合成部１２０に駆動信号として入力される。なお、利得Ｇ０，Ｇ１としては通常、複数の利得を格納した利得符号帳（図示していない）から最適な利得が選ばれる。
【００３０】
符号選択部１５０からは、適応符号帳１４１に対する探索で選ばれたピッチベクトルを示すインデックスＢと、適応代数構造符号帳１４３に対する探索で選ばれたパルス列を示すインデックスＣと、利得符号帳に対する探索で選ばれた利得Ｇ０，Ｇ１を示すインデックスＧが出力される。これらの各インデックスＢ，Ｃ，ＧとＬＰＣ量子化部１１１からのＬＰＣ係数の量子化値である合成フィルタ情報を示すインデックスＡが図示しない多重化部で多重化され、ビットストリームとして出力される。
【００３１】
次に、本実施形態の特徴部分であるパルス位置候補探索部１４２と適応代数構造符号帳１４３について説明する。
【００３２】
本実施形態では低符号化レート時にパルスが立つ位置を制限しても、従来のように音質を劣化させずに符号化レートだけを低減させることができるようにするために、パルスは駆動信号のパワの大きい所に集中して立つ性質を利用し、駆動信号のパワの大きい所ほど多くの位置候補が割り振られるようにサブフレーム毎にパルス位置候補が設定される。
【００３３】
ピッチベクトルは理想的な駆動信号の形状と似ているため、適応符号帳１４１の探索により求められたピッチベクトルに基づいてパルス位置候補探索部１４２でパルス位置候補を設定することは効果的である。ピッチベクトルは、復号化側でも符号化側と同一のものが求められるため、パルス位置候補の適応化に伴って余分な付加情報を発生させる必要はない。
【００３４】
パルス位置候補の適応化に際して、パワの大きい所のみに位置候補を割り振ると、パワの小さな区間では連続して位置候補が存在しなくなることが原因で音質が劣化することもある。パルス位置候補の適応化の方法は様々な方法が考えられるが、例えば以下のような方法をとることにより音質劣化の少ない適応化が可能である。
【００３５】
図２に示すフローチャートを用いて、パルス位置候補探索部１４２によるパルス位置候補の適応化の処理手順を説明する。また、図３に図２の各ステップにおける入力ピッチベクトル波形（Ｆ０）、この入力ピッチベクトル波形のパワ（Ｆ１）、平滑化したパワ（Ｆ２）、この平滑化したパワをサンプル方向に積分した値（Ｆ３）を図２に対応させてそれぞれ示す。
【００３６】
パワの他に振幅値の絶対値（パワの平方根）など波形の形状を表す他の尺度を用いても同様の処理が可能である。本発明ではこれらをまとめてパワで代表することにする。
【００３７】
まず最初に、図３の入力ピッチベクトル（Ｆ０）について、パワ（Ｆ１）を算出し（ステップＳ１）、次いでパワ（Ｆ１）を平滑化し、平滑化パワ（Ｆ２）を得る（ステップＳ２）。パワの平滑化には、例えば数サンプルの窓で重みを付けて移動平均をとるなどの方法がある。
【００３８】
次に、ステップＳ２で平滑化されたパワをサンプル方向に積分する（ステップＳ３）。この様子が図３の（Ｆ３）に示されている。具体的には、ｎ番目のサンプルの平滑化されたパワをｐ（ｎ）、この平滑化されたパワｐ（ｎ）の積分値をｑ（ｎ）、サブフレーム長をＬとすると、積分値ｑ（ｎ）は
ｑ（ｎ）＝ｐ（ｎ）＋ｑ（ｎ−１）＋Ｃ（ｎ＝０，…，Ｌ−１）
で求められる。ただし、Ｃは定数であり、パルス位置候補の密度の偏りの度合いを調節する。
【００３９】
次に、この積分値ｑ（ｎ）を用いてパルス位置候補の算出を行う（ステップＳ４）。この場合、最終サンプルでの積分値が求める位置候補数がＭになるように積分値を正規化する。ｍ番目の候補の位置は、図３の（Ｆ３）に示したように積分値と対応させることで、Ｓｍとして求めることができる。ｍ＝０，…，Ｍ−１まで繰り返すことでＭ個の位置候補を求めることができる。
【００４０】
図４に、このようにして求められたパルス位置候補とピッチベクトルのパワとの関係を示す。実線はピッチベクトルのパワ包絡、矢印はパルス位置候補を示している。同図に示されるように、パルス位置候補の分布はピッチベクトルのパワの大きいところでは密となり、パワが小さくなるに従って疎になってゆく。その結果、音質上重要なピッチベクトルのパワの大きいところでは、より正確にパルス位置を選ぶことができる。また、低符号化レート化によってパルス位置候補の数が減少しても、少ないパルス位置候補をピッチベクトルのパワの大きい所に適応的に集中させることで、高音質の符号化が可能となる。
【００４１】
次に、このようにして求められた位置候補をチャネル毎に分配する（ステップＳ５）。分配の方法も様々であるが、図３の（Ｆ４）に示したように位置候補は各チャネルが互い違いになるように分配されるのが望ましい。このようにして、適応代数構造符号帳１４３が求められる。探索では、この適応代数構造符号帳１４３の各チャネル（Ｃｈ１，Ｃｈ２，Ｃｈ３）から１パルスずつ最適な位置と符号が選ばれ、３本のパルスで構成される雑音ベクトルが生成される。
【００４２】
サブフレーム長が８０サンプルの場合、パルス候補位置を全チャネル合計で４０サンプル程度に削減しても、上記の手法を用いれば聴覚的な劣化はほとんど感じられなくなる。
【００４３】
代数構造符号帳ではパルスの振幅は通常＋１または−１のどちらかであるが、振幅情報を持つパルスを用いる方法も提案されている、文献４（ＣｈａｎｇＤｅｙｕａｎ， ”Ａｎ８ｋｂ／ｓｌｏｗｃｏｍｐｌｅｘｉｔｙＡＣＥＬＰｓｐｅｅｃｈｃｏｄｅｃ，” １９９６３ｒｄＩｎｔｅｒｎａｔｉｏｎａｌＣｏｎｆｅｒｅｎｃｅｏｎＳｉｇｎａｌＰｒｏｃｅｓｓｉｎｇ，ｐｐ．６７１−４，１９９６）に示されているようにパルスの振幅を１．０，０．５，０，−０．５，−１．０の中から選択する方法があげられる。また、文献５（Ｋ．ＯｚａｗａａｎｄＴ．Ａｒａｓｅｋｉ， ”ＬｏｗＢｉｔＲａｔｅＭｕｌｔｉ−ｐｕｌｓｅＳｐｅｅｃｈＣｏｄｅｒｗｉｔｈＮａｔｕｒａｌＳｐｅｅｃｈＱｕａｌｉｔｙ，” ＩＥＥＥＰｒｏｃ．ＩＣＡＳＳＰ’ ８６，ｐｐ．４５７−４６０，１９８６）に示されているパルス音源の一種であるマルチパルス方式なども駆動信号が振幅を持つパルス列から構成される。本発明はこれらの例に代表されるようなパルスが振幅をもつ場合にも適用可能である。
【００４４】
次に、図５を用いて図１の音声符号化システムに対応する音声復号化システムについて説明する。
【００４５】
図１と同一機能を有する部分に同一符号を付して説明すると、図５の音声復号化システムは、ＬＰＣ合成部１２０と、ＬＰＣ逆量子化部１２１と、適応符号帳１４１と、パルス位置候補探索部１４２と、適応代数構造符号帳１４３と、ピッチ周期強調部１６０と、利得乗算部１０２，１０３および加算部１０４から構成され、図１の音声符号化システムから伝送されてきた符号化ストリームが入力される。
【００４６】
入力された符号化ストリームは図示しない逆多重化部１２１に入力され、この逆多重化部１２１によって前述した合成フィルタ情報のインデックスＡ、適応符号帳１４１に対する探索で選ばれたピッチベクトルを示すインデックスＢ、適応代数構造符号帳１４３に対する探索で選ばれたパルス列を表すインデックスＣ、利得符号帳に対する探索で選ばれた利得Ｇ０，Ｇ１を示すインデックスＧおよびピッチ周期を示すインデックスＬに分離されて取り出される。
【００４７】
インデックスＡは、ＬＰＣ逆量子化部１２１で復号されて合成フィルタ情報であるＬＰＣ係数が求められ、ＬＰＣ合成部１２０に入力される。インデックスＢおよびＣは、適応符号帳１４１および適応代数構造符号帳１４３にそれぞれ入力され、これらの符号帳１４１，１４３からピッチベクトルおよびパルス列が出力される。この場合、適応代数構造符号帳１４３は、適応符号帳１４１から入力されたピッチベクトルに基づいてパルス位置候補探索部１４２で生成されたた適応代数構造符号帳１４３とインデックスＢから、パルス位置と符号を決定してパルス列を出力する。適応代数構造符号帳１４３から出力されるパルス列は、必要に応じてピッチ周期強調部１６０によりピッチ周期Ｌの周期性が与えられる。
【００４８】
適応符号帳１４１から出力されるピッチベクトルおよび適応代数構造符号帳１４３から出力され、かつ必要に応じてピッチ周期強調部１６０で周期性が与えられたパルス列は、利得乗算部１０２，１０３によりピッチベクトルに対する利得Ｇ０および雑音ベクトルに対する利得Ｇ１がそれぞれ乗じられた後、加算部１０４で加え合わせられてＬＰＣ合成部１２０に駆動信号として入力され、このＬＰＣ合成部１２０から再生音声信号が出力される。利得Ｇ０，Ｇ１は、インデックスＧに従って図示しない利得符号帳から選ばれる。
【００４９】
このように本実施形態によれば、音声の品質を維持したまま、ビットレートのみを削減することが可能となり、低符号化レートで高音質の音声符号化／復号化を実現することができる。
【００５０】
図６に、本発明の第２の実施形態に係る音声符号化システムが示される。この音声符号化システムは、第１の実施形態による図１に示した構成からパルス位置候補探索部１４２および適応代数構造符号帳１４３を取り除き、適応代数構造符号帳１４３に代わるものとして一般的な雑音符号帳１４４を備え、さらにパルス整形フィルタ分析部１６１とパルス整形部１６２が追加された構成となっている。
【００５１】
次に、本実施形態の処理手順について説明すると、入力音声信号のＬＰＣ分析およびＬＰＣ量子化を行った後、適応符号帳１４１の探索を行う所までは、第１の実施形態と同じである。雑音符号帳１４４は、この例では例えば代数構造符号帳により構成される。
【００５２】
パルス整形フィルタ分析部１６１は適応符号帳１４１の探索で求められたピッチベクトルに基づいてパルス整形部１６２のフィルタ係数を決定して出力する。パルス整形部１６２は、雑音符号帳１４４の出力を整形し雑音ベクトルとして出力する。
【００５３】
第１の実施形態と同様に、必要に応じてピッチ周期強調部１６０を用いて雑音ベクトルが周期化され、ピッチベクトルと雑音ベクトルに対する利得Ｇ０，Ｇ１が決められインデックスが出力される。パルス整形部１６２のフィルタ係数はピッチベクトルから求められるため、新たな付加情報を必要としない。
【００５４】
本実施形態の特徴は、パルス整形部１６２をピッチベクトルの波形を基に設定し、代数構造符号帳からなる雑音符号帳１４４の出力であるパルス列にパルス整形を施す点にある。第１の実施形態で述べたように、低符号化レート化に伴ってパルス位置、パルス数が減少し音質の劣化が目立つようになる。パルス数が減少した場合は「プチプチ」という雑音が復号音声に目立つようになるが、本実施形態のようにパルス整形部１６２を用いることで、このプチプチ感が大幅に軽減される。
【００５５】
パルス整形部１６２の設計方法としては、様々な方法を用いることができる。第一の例として、合成フィルタを駆動する駆動信号を位相等化すると、それがパルス状の信号になるという性質を利用する方法が考えられる。位相等化の逆フィルタを用いれば、パルス状の信号を入力することで駆動信号状の波形が得られることになる。従来のパルス波形を用いた場合のデメリットは理想的な駆動信号に含まれている位相情報が欠如してしまう点であり、パルス数が少なくなるとこの問題が顕著になる。そこで、この例のように位相情報をパルス整形部１６２で付加することで、パルス波形からより理想的な駆動信号に近い波形を生成することができる。
【００５６】
この第一の例では、位相等化逆フィルタのフィルタ係数の情報を伝送する必要があり、その分だけ符号化レート（ｂｉｔｒａｔｅ）が増える。そこで、パルス整形部１６２の第二の例として、位相情報の近似としてピッチベクトルを用いる方法が考えられる。有音区間などではピッチベクトルは、駆動信号と形状が類似しているため、位相情報を取り出すことができる。
【００５７】
具体的な方法の一つとして、ピッチベクトルのピーク位置などの同期点を求め、この同期点から数サンプル分の波形を取り出し、これをインパルス応答とするパルス整形フィルタを用いることができる。取り出す波形の長さは２〜３サンプル程度で効果が現われる。また、取り出したサンプルに窓をかけて減衰させてそれを用いるのも効果がある。さらに、ピッチベクトルは復号側でも符号化側と同一のものが得られるため、新たな伝送ビットを必要としない利点もある。雑音符号帳１４４の探索時には、パルス整形部１６２は一定であるため、そのインパルス応答をＬＰＣ合成部１２０と合わせて予め計算しておくことで、計算量を削減することができる。
【００５８】図７に、図６の音声符号化システムに対応する音声復号化システムが示される。図６と同一機能を有する部分に同一符号を付して説明すると、図７の音声復号化システムは、ＬＰＣ合成部１２０と、ＬＰＣ逆量子化部１２１と、適応符号帳１４１と、代数構造符号帳からなる雑音符号帳１４４と、パルス整形フィルタ分析部１６１と、パルス整形部１６２と、ピッチ周期強調部１６０と、利得乗算部１０２，１０３および加算部１０４から構成され、図６の音声符号化システムから伝送されてきた符号化ストリームが入力される。
【００５９】
入力された符号化ストリームは、図示しない逆多重化部に入力され、この逆多重化部によって前述した合成フィルタ情報のインデックスＡ、適応符号帳１４１に対する探索で選ばれたピッチベクトルを示すインデックスＢ、雑音符号帳１４４に対する探索で選ばれたパルス列を表すインデックスＣと、利得符号帳に対する探索で選ばれた利得Ｇ０，Ｇ１を示すインデックスＧに分離されて取り出される。ピッチ周期Ｌは、インデックスＢより算出される。
【００６０】
インデックスＡは、ＬＰＣ逆量子化部１２１で復号されて合成フィルタ情報となり、ＬＰＣ合成部１２０に入力される。インデックスＢおよびＣは適応符号帳１４１および雑音符号帳１４４にそれぞれ入力され、これらの符号帳１４１，１４４からピッチベクトルおよびパルス列が出力される。
【００６１】
この場合、雑音符号帳１４４から出力されるパルス列は、適応符号帳１４１の探索で求められたピッチベクトルに基づいてパルス整形フィルタ分析部１６１により係数が設定されたパルス整形部１６２により処理された後、必要に応じてピッチ周期強調部１６０によりピッチ周期Ｌの周期性が与えられる。
【００６２】
適応符号帳１４１から出力されるピッチベクトルおよび雑音符号帳１４４から出力され、パルス整形部１６２およびピッチ周期強調部１６０を経たパルス列は、利得乗算部１０２，１０３によりピッチベクトルに対する利得Ｇ０および雑音ベクトルに対する利得Ｇ１がそれぞれ乗じられた後、加算部１０４で加え合わせられ、ＬＰＣ合成部１２０に駆動信号として入力され、このＬＰＣ合成部１２０から合成された復号音声信号が出力される。利得Ｇ０，Ｇ１は、インデックスＧに従って図示しない利得符号帳から選ばれる。
【００６３】
このように本実施形態によると、パルス整形部１６２を用いることで、雑音符号帳１４４に低符号化レート化によってパルス数が減少した代数構造符号帳を用いた場合においても、復号音声の音質を維持したまま符号化レートだけを効果的に削減することが可能になる。
【００６４】
図８に、本発明の第３の実施形態に係る音声符号化システムが示される。この音声符号化システムは、第１の実施形態の構成に第２の実施形態で説明したパルス整形フィルタ分析部１６１とパルス整形部１６２を加えた構成になっている。
【００６５】
次に、本実施形態の処理手順について説明すると、第１の実施形態と同様にまずＬＰＣ分析およびＬＰＣ量子化が行われ、適応符号帳１４１の探索が完了した後、ピッチベクトルがパルス位置候補探索部１４２とパルス整形フィルタ分析部１６１に渡される。パルス位置候補探索部１４２では、第１の実施形態で述べた方法を用いてパルス位置候補が求められ，適応代数構造符号帳１４３が作られる。パルス整形フィルタ分析部１６１では、第２の実施形態で述べたようにパルス整形部１６２の係数が求められる。
【００６６】
適応代数構造符号帳１４３の探索では、出力されたパルス列はパルス整形部１６２で整形される。実際の探索では、パルス整形部１６２やピッチ周期強調部１６０のインパルス応答はＬＰＣ合成部１２０と合わせられ、計算量の削減が行われる。
【００６７】
図９に、図８の音声符号化システムに対応する音声復号化システムが示される。この音声復号化システムの動作は第１および第２の実施形態で説明した音声復号化システムの動作から自明であるので、図１、図７および図８と同一部分に同一符号を付して詳細な説明は省略する。
【００６８】
このように本実施形態では、第１の実施形態で説明したパルス位置候補探索部１４２および適応代数構造符号帳１４３と、第２の実施形態で説明したパルス整形フィルタ分析部１６１およびパルス整形部１６２を同時に用いることで、限られた位置候補に少数のパルスを立てる場合でも高い音質を維持することが可能となり、高音質、低符号化レートの音声符号化方式を実現することができる。
【００６９】
図１０に本発明の第４の実施形態に係る音声符号化システムのブロック図を示す。この音声符号化システムでは、第１の実施形態のパルス位置候補探索部がピッチベクトル平滑部１７１と位置候補密度関数算出部１７２および位置候補算出部１７３から構成されている他は、第１の実施形態と同じ構成である。
【００７０】
次に、本実施形態の処理手順について説明すると、第１の実施形態と同様に、まずＬＰＣ分析およびＬＰＣ量子化と、適応符号帳１４１の探索が完了した後、ピッチベクトルがパルス位置候補探索部１４２のピッチベクトル平滑部１７１に渡される。ピッチベクトル平滑部１７１ではピッチベクトルに対し、例えば図２のフローチャートのステップＳ１〜Ｓ２の処理を行い、ピッチベクトルのパワ包絡を求め、これを出力する。位置候補密度関数算出部１７２ではパワ包絡を位置候補密度関数に変換し、出力する。位置候補算出部１７３ではパワ包絡の代わりにこの位置候補密度関数を用いてパルス位置候補を算出し、得られたパルス位置候補に従って適応代数構造符号帳１４３を作る。以降の処理は第１の実施形態と同様である。
【００７１】
本実施形態の特徴は、パルス位置候補探索部１４２の処理の方法にある。第１の実施形態ではピッチベクトルのパワ包絡をそのまま用いてパルス位置候補の適応化を行っていたのに対し本実施形態ではパワ包絡を位置候補密度関数に変換した後これを用いて適応化を行っている。図１１を用いて詳しく説明する。図１１（ａ）がピッチベクトル平滑化部１７１から出力されたピッチベクトルのパワ包絡である。位置候補密度関数算出部１７２では、ピッチベクトルのパワ包絡（図１１（ａ））から位置候補密度関数（図１１（ｂ））を生成する。この時、図１１（ｃ）に示したパワ包絡の値（ｘ）と位置候補密度関数の値（ｆ（ｘ））の対応を示す関数ｆを用いて変換を行う。関数ｆの作成方法は例えば多くの学習音声を処理する事で統計的に求めておく方法などがあげられる。
また、関数の代わりにテーブルデータ等を用いることも可能である。
【００７２】
パルス位置候補探索部１４２は変換用の関数ｆも合めて、符号器と復号器にそれぞれ同一のものを用意するので、適応化に関する情報は送る必要がなく、適応化を行わない場合と比べてビットレートの増加は無い。
【００７３】
図１２に図１０の音声符号化システムに対応する本実施形態の音声復号化システムの構成を示す。この音声復号化システム動作は第１〜３の実施形態で説明した音声復号化システムの動作から自明であるので詳細な説明は省略する。
【００７４】
このように本実施形態ではピッチベクトルのパワ包絡の値とパルス位置候補の密度を関数ｆを用いて変換するため、第１の実施形態に比べて処理手順は僅かに複雑になるが、より正確な位置候補の配分が可能となる。また、第１の実施形態は、本実施形態においてｘ＝ｆ（ｘ）とした場合と考えることができる。
【００７５】
図１３に本発明の第５の実施形態に係る音声符号化システムのブロック図を示す。この音声符号化システムでは、第１の実施形態のパルス位置候補探索部がピッチフィルタ逆演算部１７４と平滑化部１７５および位置候補算出部１７３から構成されている他は、第１の実施形態と同じ構成である。
【００７６】
次に、本実施形態の処理手願について説明すると、第１の実施形態と同様にまず、ＬＰＣ分析およびＬＰＣ量子化と、適応符号帳１４１の探索が完了した後、ピッチベクトルがパルス位置候補探索部１４２のピッチフィルタ逆演算部１７４に渡される。ピッチフィルタ逆演算部１７４はピッチ周期強調部１６０の逆特性を表す演算を行う。例えばピッチフィルタの伝達関数Ｐ（Ｚ）が
Ｐ（ｚ）＝１−ａｚ＾（−Ｌ）（１）
で与えられる場合、ピッチフィルタ逆演算部１７４では伝達関数Ｑ（ｚ）が
Ｑ（Ｚ）＝ｌ／（１−ｂａｚ＾（−Ｌ））（２）
で与えられるフィルタを用いる方法が挙げられる。ここでａは定数、ｂは逆特性の度合を表し、ｂ＝１の時Ｑ（ｚ）はＰ（ｚ）の逆フィルタとなる。入力されたピッチベクトルは逆演算が施された後、出力され、平滑化部１７５で実施形態４のピッチベクトル平滑化部１７１と同様の手法でパワ包絡が求められる。位置候補算出部１７３ではこのパワ包絡に従っでパルス位置候補を選択し、適応代数構造符号帳１４３を作る。以降の処理は実施形態１と同様である。
【００７７】
本実施形態の特徴はピッチ周期強調部１６０の影響を考慮したピッチベクトルをパルス位置候補の適応化に用いる点である。このようにすることで効率が上がる理由を述べる。
【００７８】
適応代数構造符号帳から生成された雑音ベクトルはピッチ周期強調部１６０でピッチ周期化がされる。周期化に式（１）を用いた場合、サブフレームの先頭に近いパルスはピッチ周期間隔でサブフレーム内で何度も繰り返されるのに対し、後半のパルスほど繰り返される回数が少なくなる。実際に得られた雑音符号ベクトルを観測すると、強いピッチフィルタが用いられる場合ほど先頭に近い位置にパルスが立ちやすい傾向があることが確認できる。このことから、パルス位置はピッチベクトルの形状だけでなく、ピッチフィルタとも関係が深いことがわかる。本実施形態ではピッチフィルタ逆演算部１７４を用いることにより、ピッチ周期強調部１６０の影響を考慮したパルス位置候補の適応化を実現している。
【００７９】
ところで、第３の実施形態では雑音ベクトルにパルス整形フィルタとピッチフィルタの２種類のフィルタをかけることが可能である。このような場合に本実施形態を適用する場合は、２つのフィルタを合わせた特性を求め、この特性の逆特性をピッチフィルタ逆演算部に用いるのが理想的である。しかし、処理量が増えるため影響の大きなピッチフィルタの特性のみを用いるだけでも効果は得られる。また、ピッチフィルタ逆演算部１７４と平滑化部１７５の順序は逆でも実現可能である。
【００８０】
図１４に図１３の音声符号化システムに対応する本実施形態の音声復号化システムの構成を示す。この音声符号化システムの動作は第１乃至４実施形態で説明した音声復号化システムの動作から自明であるので詳細な説明は省略する。
【００８１】
図１５に本発明の第６の実施形態に係る音声符号化システムのブロック図を示す。この音声符号化システムでは、第１の実施形態の適応代数構造符号帳が雑音ベクトル生成部１８０と振幅符号帳１８１に置き替わっている他は、第１の実施形態と同じ構成である。
【００８２】
次に、本実施形態の処理手順について説明すると、第１の実施形態と同様にまずＬＰＣ分析およびＬＰＣ量子化と、適応符号帳１４１の探索が完了した後、ピッチベクトルがパルス位置探索部１７４に渡される。パルス位置探索部１７４では第１の実施形態と同様の手法でピッチベクトルのパワ包絡に基づきパルス位置を求め、雑音ベクトル生成部にこれを出力する。ここで、本実施形態がこれまでの実施形態と異なる点はパルス位置探索部１７４で得られた位置には雑音ベクトル探索部で全てパルスが立てられる点である。つまり、これまでの実施形態ではパルス位置の候補が求められ、この中から適応代数構造符号帳で最適なパルス位置を選んでいたのに対し、本実施形態ではパルス位置の候補の全部を同時に用いる。従ってパルス位置を選ぶ処理は不要になる。その代わりに、各パルスの振幅を振幅符号帳１８１から選ぶ処理が追加される。また、出力信号もパルス位置を示す情報ｃの代わりにパルスの振幅を表す情報Ｄが出力される。
【００８３】
図１６を用いて雑音ベクトルの生成方法を詳しく説明する。図１６（ａ）に振幅符号帳から得られた振幅パターンを矢印で示す。この場合、７本のパルスを立てることを想定している。図１６（ｂ）と図１６（ｃ）の波形はパルス位置探索部１７４で得られたピッチベクトルパワ包絡とこれに対応するパルス位置（図の○印）である。図１６（ｂ）ではパワの山が２箇所あるため７個のパルス位置が２箇所に分散されているのに対し、図１６（ｃ）では山が中央に１箇所あるので中央にパルス位置が集中している。図１６（ｄ）と図１６（ｅ）はそれぞれのパルス位置に図１６（ａ）の振幅のパルスを立てられた雑音ベクトルである。ピッチベクトルパワ包絡に合わせて駆動信号の形状も変化することが分る。既に述べたようにピッチベクトルのパワ包絡の情報は伝送する必要がないため、本実施形態ではビットレートの増加を伴わずに雑音ベクトルの形状を理想的な雑音ベクトルの形に近づけることができる。
【００８４】
本実施形態ではビットレートが高くなるに従ってパルスの振幅情報Ｄも多く送れるようになり品質も向上するが、向上の度合は鈍くなっていく。ある程度高いビットレートでは、振幅情報を増やすよりも選ばれなかった位置にパルスを立てた雑音ベクトルも探索の候補に含めた方が性能が向上する場合がある。具体的には、パルス位置探索部１７４は異なるパルス位置のパターン（パルスパターン）を出力し、雑音ベクトル生成部ではパルスパターンごとに振幅を探索する。パルスパターンは前述のピッチベクトルに適応化させたパルスパターンの他に、このパルスパターンに選ばれなかったパルス位置から生成されたパルスパターンも用意する。例えばサブフレームの全サンプル位置から適応化で選ばれたサンプル位置を引いた残りを第２のパルスパターンとして２種類のパルスパターンに対して振幅の探索を行う方法が挙げられる。振幅情報に割り当てられるビット数は各パルスパターンごとに異なる構成にすることも可能であり、通常適応化を用いたパルスパターンの方に多くのビットを配分した方が効率が良い。複数のパルスパターンを用いた場合、どのパルスパターンを用いたかを表す情報を情報Ｄに含めて伝送する必要があり、その分、振幅情報が減ってしまうが、単一のパルスパターンのみを探索するより品質が良い。
【００８５】
図１７に図１５の音声符号化システムに対応する本実施形態の音声復号化システムの構成を示す。この音声復号化システム動作は第１〜５の実施形態で説明した音声復号化システムの動作から自明であるので詳細な説明は省略する。
【００８６】
なお、上述の実施形態では音声符号化／復号化方法について説明したが、本発明は音声合成方法にも適用でき、その場合は図５、図７および図９に示した音声復号化システムにおいて、各インデックスを合成したい再生音声信号に基づいて与えればよい。
【００８７】
【発明の効果】
以上説明したように、本発明によれば低符号化レート化によってパルス位置やパルス数が削減された代数構造符号帳を用いても、高音質の音声符号化／復号化を行うことができる。
【図面の簡単な説明】
【図１】本発明の第１の実施形態に係る音声符号化システムのブロック図
【図２】第１の実施形態におけるパルス位置候補の選択手順を示すフローチャート
【図３】図２の各ステップでの処理の様子を示す図
【図４】第１の実施形態におけるピッチベクトルのパワ包絡とパルス位置候補の関係を示す図
【図５】第１の実施形態に係る音声復号化システムのブロック図
【図６】本発明の第２の実施形態に係る音声符号化システムのブロック図
【図７】第２の実施形態に係る音声復号化システムのブロック図
【図８】本発明の第３の実施形態に係る音声符号化システムのブロック図
【図９】第３の実施形態に係る音声復号化システムのブロック図
【図１０】本発明の第４の実施形態に係る音声符号化化システムのブロック図
【図１１】ピッチベクトルパワ包絡、位置候補密度関数、パワー包絡の値と位置候補密度関数の値の関係をそれぞれ示す図
【図１２】第４の実施形態に係る復号システムのブロック図
【図１３】本発明の第５の実施形態に係る音声符号化化システムのブロック図
【図１４】第５の実施形態に係る復号システムのブロック図
【図１５】本発明の第６の実施形態に係る音声符号化化システムのブロック図
【図１６】雑音ベクトル生成方法を説明するための図
【図１７】第６の実施形態に係る復号システムのブロック図
【符号の説明】
１０１…音声入力端子
１０２，１０３…利得乗算部
１０４，１０５…加算部
１１０…ＬＰＣ分析部
１１１…ＬＰＣ量子化部
１２０…ＬＰＣ合成部
１３０…聴覚重み付け部
１４１…適応符号帳
１４２…パルス位置候補探索部
１４３…適応代数構造符号帳
１４４…雑音符号帳
１５０…符号選択部
１６０…ピッチ周期強調部
１６１…パルス整形フィルタ分析部
１６２…パルス整形部
１７１…ピッチベクトル平滑部
１７２…位置候補密度関数算出部
１７３…位置候補算出部
１７４…パルス位置探索部
１８０…雑音ベクトル生成部
１８１…振幅符号帳[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a low coding rate speech encoding / decoding method used for digital telephones, voice memos, and the like.
[0002]
[Prior art]
2. Description of the Related Art In recent years, as a coding technique for compressing voice and musical sound into a small amount of information for transmission and storage on a mobile phone or the Internet, a CELP method (Code Excited Linear Prediction (MR Schroeder and BS Atal, "Code Excited Linear Prediction (CELP): High Quality Speech at Very Low Bit Rates," Proc. ICASSP, pp. 937-940, 1985 (Literature 1) and W. S. Kj. Improved Speech Quality and Efficient Vector Quantification in SELP, "Proc. ICASS P, pp. 155-158, 1988 (Reference 2)).
[0003]
CELP is a coding method based on linear prediction analysis, and an input speech signal is divided into a linear prediction coefficient representing phoneme information and a prediction residual signal representing a pitch or the like by linear prediction analysis. A recursive digital filter called a synthesis filter is configured based on the linear prediction coefficients, and the original input audio signal can be restored by inputting a prediction residual signal as a drive signal to the synthesis filter.
[0004]
In order to perform encoding at a low rate, it is necessary to encode a linear prediction coefficient, which is synthesis filter information representing characteristics of a synthesis filter, and a prediction residual signal, which is a drive signal for driving the synthesis filter, with a smaller amount of information. is there. In the CELP method, a signal obtained by encoding a prediction residual signal is generated as a drive signal by multiplying two types of signals, that is, a pitch vector and a noise vector, by an appropriate gain and then adding the signals. A method of generating a pitch vector is described in, for example, Reference 2.
[0005]
In addition to the method of Reference 2, a method of using a fixed code vector in a voice rising part (onset) has been proposed, but in the present invention, these are collectively referred to as a pitch vector.
The noise vector is usually generated by storing a large number of candidates in a random codebook and selecting an optimum one from the stored candidates. As a noise vector search method, all the noise vectors are added to the pitch vector, then a synthetic speech signal is generated through a synthesis filter, and the distortion of the synthesized speech signal with respect to the input speech signal is evaluated. A method of selecting a noise vector for generating an audio signal is used. Therefore, how to efficiently store the noise vector in the noise codebook is an important point of the CELP method.
[0006]
The Algebraic Codebook (JP P. Adoul et al, "Fast CELP Coding based on algebraic codes", Proc. ICASPSP '87, pp. 1957-1960 (pp. 1957-1960) describes the noise vector as a pulse. It is a simple structure expressed only by presence / absence and polarity (+,-). The algebraic structure codebook has features that it is not necessary to store a code vector and the amount of calculation is small as compared with a method using a noise codebook storing a plurality of noise vectors. In recent years, it has been used in various standard systems because it has the same sound quality as the conventional system.
[0007]
[Problems to be solved by the invention]
However, in the algebraic structure codebook, as the coding bit rate (coding rate) decreases, the deterioration of sound quality becomes noticeable. One of the reasons is lack of pulse position information. In other words, although the position information of the pulse is algebraically simplified in the algebraic structure codebook, the above advantage is obtained. However, at a low coding rate, a position candidate exists at a position where a pulse does not need to be raised, , The sound quality is degraded as well as the efficiency is low.
[0008]
Another reason that sound quality is degraded when an algebraic structure codebook is used is a shortage of pulses. When the number of pulses is insufficient, the noise "bubble wrap" becomes noticeable in the decoded speech. This is because the drive signal is generated from the pulse train, and as the number of pulses decreases, the presence / absence of the pulse becomes more audible. In order to improve the sound quality, it is necessary to reduce the bubble wrap.
[0009]
As described above, the conventional algebraic structure codebook has a simple structure and an advantage that the amount of calculation is small.On the other hand, when the coding rate becomes low, the position information and the number of pulses of the pulse train constituting the driving signal of the synthesis filter become low. There is a problem that the sound quality of the decoded voice is deteriorated due to the shortage.
[0010]
SUMMARY OF THE INVENTION It is an object of the present invention to provide a speech encoding / decoding method capable of obtaining good sound quality even at a low encoding rate.
[0011]
[Means for Solving the Problems]
The present invention provides a step of generating information representing at least characteristics of a synthesis filter of an audio signal, and a signal for driving the synthesis filter, from a pulse position candidate that adaptively changes according to the characteristic of the audio signal. Generating a drive signal including a pulse train generated by arranging pulses at a selected predetermined number of pulse positions.
[0012]
According to the present invention, a driving signal including a pulse train generated by arranging pulses at a predetermined number of pulse positions selected from pulse position candidates that adaptively change according to the properties of an audio signal is input to a synthesis filter. To provide an audio decoding method for decoding an audio signal.
[0013]
In the speech encoding / decoding method according to the present invention, the drive signal for driving the synthesis filter arranges pulses at a predetermined number of pulse positions selected from pulse position candidates that change adaptively according to the properties of the speech signal. And the pulse train generated by the operation. More specifically, the pulse position candidates are arranged such that the greater the power of the audio signal, the more candidates there are.
[0014]
In addition, the drive signal is configured to include a pulse train generated by arranging pulses at all pulse position candidates that adaptively change according to the properties of the audio signal and optimizing the amplitude of each pulse by a predetermined means. You can also. In this case, more specifically, the pulse position candidates are arranged such that there are more candidates as the power of the audio signal increases.
[0015]
Further, the drive signal is a pulse train generated by arranging pulses at a predetermined number of pulse positions selected from first pulse position candidates that adaptively change according to the properties of the audio signal, or Using any one of the pulse trains generated by arranging pulses at a predetermined number of pulse positions selected from the second pulse position candidates consisting of part or all of the positions not used as the pulse position candidates It can also be generated. In this case, more specifically, the first pulse position candidates are arranged such that there are more candidates as the power of the audio signal is higher.
[0016]
When the drive signal includes a pitch vector and a noise vector, the noise vector is generated by arranging pulses at a predetermined number of pulse positions selected from pulse position candidates that change according to the shape of the pitch vector. You. In this case, more specifically, the pulse position candidates are arranged such that the greater the power of the pitch vector, the more candidates there are.
[0017]
Further, the noise vector is configured using a pulse train generated by arranging pulses at a predetermined number of pulse positions selected from position candidates set based on the position candidate density function determined from the shape of the pitch vector. You can also. In this case, more specifically, the pulse position candidates are arranged such that there are more candidates as the position candidate density function value increases, and the position candidate density function is the probability of the pitch vector power and the pulse being arranged. Is a function obtained in advance.
[0018]
Further, when a correction means such as a pitch cycle emphasis filter is used for the noise vector, a predetermined pitch selected from pulse position candidates that change in accordance with the shape of the inversely corrected pitch vector obtained by performing processing based on the inverse characteristic on the pitch vector. Is generated by arranging the pulses at the pulse positions of the number. In this case, more specifically, the pulse position candidates are arranged so that there are more candidates as the power of the inverse correction pitch vector is larger.
[0019]
By adaptively changing the pulse position candidates according to the properties such as the power distribution of the audio signal in this way, even when using an algebraic structure codebook in which the pulse positions and the number of pulses are reduced due to a lower coding rate. The coding efficiency is improved, and the coding rate can be reduced while maintaining the sound quality of the decoded speech. In addition, by using a pitch vector to generate a pulse position candidate, it is possible to adapt the pulse position candidate without requiring additional information.
[0020]
In another speech encoding / decoding method according to the present invention, when the drive signal includes a pitch vector and a noise vector, the method includes a pulse train shaped by pulse shaping means having characteristics determined based on the shape of the pitch vector. A drive signal is generated.
[0021]
With such a configuration, the pulse-like noise included in the decoded speech due to the decrease in the number of pulses is reduced, and the sound quality of the decoded speech is maintained even when the pulse position and the number of pulses are reduced by lowering the coding rate. It is possible to reduce the coding rate.
[0022]
Furthermore, in the speech encoding / decoding method according to the present invention, the speech signal is generated by arranging pulses at a predetermined number of pulse positions selected from pulse position candidates that adaptively change according to the properties of the speech signal. A drive signal including the generated pulse train may be generated, and the pulse train may be shaped by a pulse shaping unit having characteristics determined based on the shape of the pitch vector.
[0023]
BEST MODE FOR CARRYING OUT THE INVENTION
FIG. 1 shows a speech encoding system to which the speech encoding method according to the first embodiment is applied. This speech coding system includes input terminals 101 and 106, an LPC analysis unit 110, an LPC quantization unit 111, an LPC synthesis unit 120, an auditory weighting unit 130, an adaptive codebook 141, and a pulse position candidate search unit. 142, an adaptive algebraic structure codebook 143, a code selector 150, a pitch period enhancer 160, gain multipliers 102 and 103, and adders 104 and 105.
[0024]
An input audio signal to be encoded is input to the input terminal 101 in units of a length corresponding to one frame, and the LPC analysis unit 110 performs linear prediction analysis in synchronization with the input audio signal, thereby corresponding to vocal tract characteristics. A linear prediction coefficient (LPC coefficient) is obtained. The LPC coefficient is quantized by an LPC quantization unit 111, and the quantized value is input to the LPC synthesis unit 120 as synthesis filter information indicating the characteristics of the LPC synthesis unit 120, and an index A indicating the quantization value is encoded. As a result, it is output to a multiplexing unit (not shown).
[0025]
The adaptive codebook 141 stores drive signals that have been input to the LPC synthesis unit 120 in the past. The drive signal input to the LPC synthesis unit 120 is a signal obtained by quantizing the prediction residual signal in the linear prediction analysis, and corresponds to a vocal cord signal including information of a sound pitch. The adaptive codebook 141 cuts out a waveform having a length corresponding to a pitch period from a past drive signal, and generates a pitch vector by repeating this. The pitch vector is usually obtained in units of subframes obtained by dividing a frame into several parts.
[0026]
The pulse position candidate search unit 142 calculates, based on the pitch vector obtained by the adaptive codebook 141, a position in the subframe at which the pulse position candidate is to be set, and calculates the result in the adaptive algebraic structure codebook 143. Output to
[0027]
The adaptive algebraic structure codebook 143 performs a predetermined operation so that distortion of an input speech signal from which the influence of the pitch vector is subtracted from the pulse position candidates input from the pulse position candidate search unit 142 is minimized under the auditory weight. Are searched for the pulse positions and the sign of the pulse positions.
[0028]
The pulse train output from the adaptive algebraic structure codebook 143 is cycled by a pitch unit by the pitch cycle emphasizing unit 160 as necessary. The pitch period emphasizing unit 160 receives the information L of the pitch period obtained by searching the adaptive codebook 143 from the input terminal 106, and gives the pulse train the periodicity of the pitch period.
[0029]
The pitch train output from the adaptive codebook 141 and the pulse train output from the adaptive algebraic structure codebook 143 and given a periodicity by the pitch period emphasizing unit 160 as necessary are converted into pitch vectors by the gain multiplying units 102 and 103. Are multiplied by a gain G0 for the noise vector and a gain G1 for the noise vector, respectively, added by the adding unit 104, and input to the LPC combining unit 120 as a drive signal. Note that, as the gains G0 and G1, usually, an optimum gain is selected from a gain codebook (not shown) storing a plurality of gains.
[0030]
The code selection unit 150 outputs an index B indicating the pitch vector selected in the search for the adaptive codebook 141, an index C indicating a pulse train selected in the search for the adaptive algebraic codebook 143, and a search for the gain codebook. An index G indicating the selected gains G0 and G1 is output. These indices B, C, G and the index A indicating the synthesis filter information, which is the quantization value of the LPC coefficient from the LPC quantization unit 111, are multiplexed by a multiplexing unit (not shown) and output as a bit stream.
[0031]
Next, the pulse position candidate search unit 142 and the adaptive algebraic structure codebook 143, which are characteristic parts of the present embodiment, will be described.
[0032]
In the present embodiment, even if the position where the pulse rises at the time of the low encoding rate is limited, the pulse is used to reduce only the encoding rate without deteriorating the sound quality as in the related art. The pulse position candidates are set for each sub-frame so that the position where the power of the drive signal is higher is allocated more position candidates by utilizing the property of standing concentrated on the position where the power is higher.
[0033]
Since the pitch vector is similar to the shape of an ideal drive signal, it is effective to set the pulse position candidate in the pulse position candidate search unit 142 based on the pitch vector obtained by searching the adaptive codebook 141. . Since the same pitch vector is required on the decoding side as on the encoding side, it is not necessary to generate extra additional information with the adaptation of the pulse position candidates.
[0034]
If the position candidates are assigned only to places where the power is large when adapting the pulse position candidates, the sound quality may be degraded due to the absence of the position candidates continuously in the section where the power is small. Various methods of adapting the pulse position candidates are conceivable. For example, by adopting the following method, adaptation with less sound quality deterioration is possible.
[0035]
With reference to the flowchart shown in FIG. 2, a description will be given of a processing procedure of pulse position candidate adaptation by the pulse position candidate search unit 142. FIG. 3 shows the input pitch vector waveform (F0), the power (F1) of this input pitch vector waveform, the smoothed power (F2), and the value obtained by integrating the smoothed power in the sample direction in each step of FIG. (F3) is shown corresponding to FIG.
[0036]
Similar processing can be performed by using other scales representing the waveform shape such as the absolute value of the amplitude value (the square root of the power) in addition to the power. In the present invention, these are collectively represented by power.
[0037]
First, the power (F1) is calculated for the input pitch vector (F0) of FIG. 3 (step S1), and then the power (F1) is smoothed to obtain a smoothed power (F2) (step S2). For power smoothing, for example, there is a method of taking a moving average by weighting in a window of several samples.
[0038]
Next, the power smoothed in step S2 is integrated in the sample direction (step S3). This situation is shown in (F3) of FIG. Specifically, assuming that the smoothed power of the n-th sample is p (n), the integrated value of the smoothed power p (n) is q (n), and the subframe length is L, the integrated value is q (n) is
q (n) = p (n) + q (n-1) + C (n = 0,..., L-1)
Is required. Here, C is a constant, and adjusts the degree of bias in the density of the pulse position candidates.
[0039]
Next, a pulse position candidate is calculated using the integrated value q (n) (step S4). In this case, the integral value is normalized such that the number of position candidates for which the integral value in the final sample is obtained is M. The position of the m-th candidate can be obtained as Sm by associating it with the integral value as shown in (F3) of FIG. By repeating the processing until m = 0,..., M−1, M position candidates can be obtained.
[0040]
FIG. 4 shows the relationship between the pulse position candidates thus obtained and the power of the pitch vector. The solid line indicates the power envelope of the pitch vector, and the arrows indicate pulse position candidates. As shown in the figure, the distribution of the pulse position candidates becomes dense where the power of the pitch vector is large, and becomes sparse as the power becomes small. As a result, where the power of the pitch vector that is important for sound quality is large, the pulse position can be selected more accurately. Further, even if the number of pulse position candidates decreases due to the lowering of the coding rate, high-quality encoding becomes possible by adaptively concentrating a small number of pulse position candidates in a place where the power of the pitch vector is large.
[0041]
Next, the position candidates thus obtained are distributed for each channel (step S5). Although there are various distribution methods, as shown in (F4) of FIG. 3, it is desirable that the position candidates are distributed so that each channel is alternated. In this way, the adaptive algebraic structure codebook 143 is obtained. In the search, the optimal position and code are selected for each pulse from each channel (Ch1, Ch2, Ch3) of the adaptive algebraic structure codebook 143, and a noise vector composed of three pulses is generated.
[0042]
When the sub-frame length is 80 samples, even if the pulse candidate positions are reduced to about 40 samples in all channels in total, the above method hardly causes auditory deterioration.
[0043]
In the algebraic structure codebook, the pulse amplitude is usually either +1 or -1, but a method using a pulse having amplitude information has also been proposed. Reference 4 (Chang Deyuan, "An 8 kb / s low complexity ACELP speech") codec, "1996 3rd International Conference on Signal Processing, pp. 671-4, 1996), with pulse amplitudes of 1.0, 0.5, 0, -0.5, -1.0. There is a method of selecting from among them. Also, reference 5 (K. Ozawa and T. Araseki, "Low Bit Rate Multi-pulse Speech Coder with Natural Speech Quality," IEEE Proc. ICASP. 86, p. The multi-pulse method, which is one of the above, is also composed of a pulse train having a drive signal having an amplitude. The present invention is also applicable to a case where a pulse as represented by these examples has an amplitude.
[0044]
Next, a speech decoding system corresponding to the speech encoding system of FIG. 1 will be described with reference to FIG.
[0045]
The parts having the same functions as those in FIG. 1 are denoted by the same reference numerals. The speech decoding system in FIG. 5 includes an LPC synthesizing unit 120, an LPC dequantizing unit 121, an adaptive codebook 141, a pulse position candidate The search unit 142, the adaptive algebraic structure codebook 143, the pitch period emphasis unit 160, the gain multiplication units 102 and 103, and the addition unit 104, and the coded stream transmitted from the speech coding system of FIG. Is entered.
[0046]
The input coded stream is input to a demultiplexing unit 121 (not shown), and the index A of the synthesis filter information and the index B indicating the pitch vector selected in the search for the adaptive codebook 141 by the demultiplexing unit 121. , An index C indicating a pulse train selected in the search for the adaptive algebraic structure codebook 143, an index G indicating the gains G0 and G1 selected in the search for the gain codebook, and an index L indicating the pitch period.
[0047]
The index A is decoded by the LPC inverse quantization unit 121 to obtain an LPC coefficient, which is synthesis filter information, and is input to the LPC synthesis unit 120. The indexes B and C are input to the adaptive codebook 141 and the adaptive algebraic structure codebook 143, respectively, and a pitch vector and a pulse train are output from these codebooks 141 and 143. In this case, the adaptive algebraic structure codebook 143 obtains the pulse position and code from the adaptive algebraic structure codebook 143 and the index B generated by the pulse position candidate search unit 142 based on the pitch vector input from the adaptive codebook 141. And outputs a pulse train. The pulse train output from the adaptive algebraic structure codebook 143 is given a periodicity of the pitch period L by the pitch period emphasizing unit 160 as necessary.
[0048]
The pitch train output from the adaptive codebook 141 and the pulse train output from the adaptive algebraic structure codebook 143 and given a periodicity by the pitch period emphasizing unit 160 as necessary are converted into pitch vectors by the gain multiplying units 102 and 103. Are multiplied by a gain G0 for the noise vector and a gain G1 for the noise vector, respectively, and then added by an adder 104 and input to the LPC synthesizer 120 as a drive signal, and the LPC synthesizer 120 outputs a reproduced audio signal. The gains G0 and G1 are selected from a gain codebook (not shown) according to the index G.
[0049]
As described above, according to the present embodiment, it is possible to reduce only the bit rate while maintaining the audio quality, and it is possible to realize audio encoding / decoding with high audio quality at a low encoding rate.
[0050]
FIG. 6 shows a speech encoding system according to the second embodiment of the present invention. This speech coding system removes the pulse position candidate search unit 142 and the adaptive algebraic structure codebook 143 from the configuration shown in FIG. 1 according to the first embodiment, and replaces the adaptive algebraic structure codebook 143 with general noise. A code book 144 is provided, and a pulse shaping filter analyzing unit 161 and a pulse shaping unit 162 are further added.
[0051]
Next, the processing procedure of the present embodiment is the same as that of the first embodiment up to the point of performing LPC analysis and LPC quantization of an input speech signal and then searching the adaptive codebook 141. In this example, the noise codebook 144 is configured by, for example, an algebraic structure codebook.
[0052]
The pulse shaping filter analysis unit 161 determines and outputs a filter coefficient of the pulse shaping unit 162 based on the pitch vector obtained by searching the adaptive codebook 141. Pulse shaping section 162 shapes the output of noise codebook 144 and outputs the result as a noise vector.
[0053]
Similarly to the first embodiment, the noise vector is periodicized using the pitch period emphasizing unit 160 as necessary, the gains G0 and G1 for the pitch vector and the noise vector are determined, and the index is output. Since the filter coefficient of the pulse shaping section 162 is obtained from the pitch vector, no new additional information is required.
[0054]
The feature of this embodiment lies in that the pulse shaping unit 162 is set based on the waveform of the pitch vector, and pulse shaping is performed on the pulse train output from the noise codebook 144 including the algebraic structure codebook. As described in the first embodiment, the pulse position and the number of pulses are reduced as the encoding rate is reduced, and the deterioration of the sound quality becomes conspicuous. When the number of pulses is reduced, the noise of “bubble wrap” becomes conspicuous in the decoded speech, but by using the pulse shaping unit 162 as in the present embodiment, the wrapping feeling is greatly reduced.
[0055]
Various methods can be used as a design method of the pulse shaping unit 162. As a first example, a method is conceivable that utilizes the property that when a drive signal for driving a synthesis filter is phase-equalized, it becomes a pulse-like signal. When an inverse filter for phase equalization is used, a pulse-like signal is input to obtain a drive signal-like waveform. A disadvantage of using a conventional pulse waveform is that phase information included in an ideal drive signal is lacking. This problem becomes more conspicuous as the number of pulses decreases. Therefore, by adding the phase information by the pulse shaping unit 162 as in this example, it is possible to generate a waveform closer to an ideal drive signal from the pulse waveform.
[0056]
In the first example, it is necessary to transmit the information of the filter coefficient of the phase equalization inverse filter, and the coding rate (bit rate) increases accordingly. Thus, as a second example of the pulse shaping unit 162, a method using a pitch vector as an approximation of the phase information is considered. Since the pitch vector is similar in shape to the drive signal in a sound section or the like, phase information can be extracted.
[0057]
As a specific method, a pulse shaping filter that determines a synchronization point such as a peak position of a pitch vector, extracts a waveform of several samples from the synchronization point, and uses the waveform as an impulse response can be used. The effect appears when the length of the extracted waveform is about two to three samples. It is also effective to apply a window to the sample taken out and attenuate it. Furthermore, since the same pitch vector can be obtained on the decoding side as on the encoding side, there is also an advantage that a new transmission bit is not required. At the time of searching noise codebook 144, pulse shaping section 162 is constant, so that the amount of calculation can be reduced by calculating the impulse response in advance together with LPC synthesis section 120.
FIG. 7 shows a speech decoding system corresponding to the speech encoding system of FIG. The parts having the same functions as those in FIG. 6 will be described with the same reference numerals. The speech decoding system in FIG. 7 includes an LPC synthesis unit 120, an LPC inverse quantization unit 121, an adaptive codebook 141, an algebraic structure code A noise codebook 144 composed of a book, a pulse shaping filter analyzing unit 161, a pulse shaping unit 162, a pitch period emphasizing unit 160, gain multiplying units 102 and 103, and an adding unit 104 are included. An encoded stream transmitted from the system is input.
[0059]
The input coded stream is input to a demultiplexing unit (not shown), and the index A of the synthesis filter information described above by the demultiplexing unit, the index B indicating the pitch vector selected in the search for the adaptive codebook 141, The index C representing the pulse train selected in the search for the noise codebook 144 and the index G indicating the gains G0 and G1 selected in the search for the gain codebook are separated and extracted. The pitch period L is calculated from the index B.
[0060]
The index A is decoded by the LPC inverse quantization unit 121 to become synthesis filter information, and is input to the LPC synthesis unit 120. The indexes B and C are input to the adaptive codebook 141 and the noise codebook 144, respectively, and a pitch vector and a pulse train are output from these codebooks 141 and 144.
[0061]
In this case, the pulse train output from the noise codebook 144 is processed by the pulse shaping unit 162 in which the coefficient is set by the pulse shaping filter analysis unit 161 based on the pitch vector obtained by the search of the adaptive codebook 141. The pitch period L is given a periodicity by the pitch period emphasizing unit 160 as necessary.
[0062]
The pulse train output from the adaptive codebook 141 and the pulse train output from the noise codebook 144 and passed through the pulse shaping unit 162 and the pitch period emphasizing unit 160 are processed by the gain multiplying units 102 and 103 with respect to the gain G0 for the pitch vector and the noise vector. After being multiplied by the gains G1, respectively, they are added by the adder 104, input to the LPC synthesizer 120 as a drive signal, and the LPC synthesizer 120 outputs a decoded speech signal synthesized. The gains G0 and G1 are selected from a gain codebook (not shown) according to the index G.
[0063]
As described above, according to the present embodiment, by using the pulse shaping unit 162, even when the algebraic structure codebook in which the number of pulses is reduced due to the lower coding rate is used as the noise codebook 144, the sound quality of the decoded speech is improved. It is possible to effectively reduce only the coding rate while maintaining it.
[0064]
FIG. 8 shows a speech coding system according to the third embodiment of the present invention. This speech coding system has a configuration obtained by adding the pulse shaping filter analysis unit 161 and the pulse shaping unit 162 described in the second embodiment to the configuration of the first embodiment.
[0065]
Next, the processing procedure of the present embodiment will be described. As in the first embodiment, first, LPC analysis and LPC quantization are performed, and after the search of the adaptive codebook 141 is completed, the pitch vector becomes the pulse position candidate search. It is passed to the section 142 and the pulse shaping filter analyzing section 161. The pulse position candidate search unit 142 obtains pulse position candidates using the method described in the first embodiment, and creates an adaptive algebraic structure codebook 143. In the pulse shaping filter analysis unit 161, the coefficients of the pulse shaping unit 162 are obtained as described in the second embodiment.
[0066]
In the search of the adaptive algebraic structure codebook 143, the output pulse train is shaped by the pulse shaper 162. In the actual search, the impulse responses of the pulse shaping unit 162 and the pitch period emphasizing unit 160 are combined with the LPC synthesizing unit 120, and the amount of calculation is reduced.
[0067]
FIG. 9 shows a speech decoding system corresponding to the speech encoding system of FIG. Since the operation of this speech decoding system is obvious from the operation of the speech decoding system described in the first and second embodiments, the same reference numerals are given to the same parts as in FIGS. Detailed description is omitted.
[0068]
Thus, in the present embodiment, the pulse position candidate searching unit 142 and the adaptive algebraic structure codebook 143 described in the first embodiment, and the pulse shaping filter analyzing unit 161 and the pulse shaping unit 162 described in the second embodiment. Are used at the same time, it is possible to maintain high sound quality even when a small number of pulses are set at limited position candidates, and it is possible to realize a speech coding method with high sound quality and a low coding rate.
[0069]
FIG. 10 shows a block diagram of a speech coding system according to the fourth embodiment of the present invention. In this speech coding system, the pulse position candidate search unit of the first embodiment is configured by a pitch vector smoothing unit 171, a position candidate density function calculation unit 172, and a position candidate calculation unit 173, except that It has the same configuration as the form.
[0070]
Next, the processing procedure of the present embodiment will be described. Similar to the first embodiment, after the LPC analysis and LPC quantization and the search of the adaptive codebook 141 are completed, the pitch vector is changed to the pulse position candidate search unit. 142 is passed to the pitch vector smoothing unit 171. The pitch vector smoothing unit 171 performs, for example, the processing of steps S1 and S2 of the flowchart of FIG. 2 on the pitch vector, obtains the power envelope of the pitch vector, and outputs this. The position candidate density function calculation unit 172 converts the power envelope into a position candidate density function and outputs it. The position candidate calculation unit 173 calculates a pulse position candidate using this position candidate density function instead of the power envelope, and creates an adaptive algebraic structure codebook 143 according to the obtained pulse position candidate. Subsequent processing is the same as in the first embodiment.
[0071]
The feature of the present embodiment lies in the processing method of the pulse position candidate search unit 142. In the first embodiment, the adaptation of the pulse position candidate is performed using the power envelope of the pitch vector as it is. In the present embodiment, the power envelope is converted into a position candidate density function, and then the adaptation is performed using the function. Is going. This will be described in detail with reference to FIG. FIG. 11A shows the power envelope of the pitch vector output from the pitch vector smoothing unit 171. The position candidate density function calculator 172 generates a position candidate density function (FIG. 11B) from the power envelope of the pitch vector (FIG. 11A). At this time, the conversion is performed using the function f indicating the correspondence between the value (x) of the power envelope and the value (f (x)) of the position candidate density function shown in FIG. As a method of creating the function f, for example, there is a method of statistically obtaining a number by processing a large number of learning voices.
Further, table data or the like can be used instead of the function.
[0072]
Since the pulse position candidate search unit 142 prepares the same one for the encoder and the decoder together with the function f for conversion, there is no need to send information on adaptation, and the There is no increase in bit rate.
[0073]
FIG. 12 shows the configuration of the speech decoding system of the present embodiment corresponding to the speech encoding system of FIG. Since the operation of the speech decoding system is obvious from the operation of the speech decoding system described in the first to third embodiments, a detailed description will be omitted.
[0074]
As described above, in the present embodiment, since the value of the power envelope of the pitch vector and the density of the pulse position candidate are converted using the function f, the processing procedure is slightly complicated as compared with the first embodiment, but is more accurate. Distribution of the position candidates can be realized. Further, the first embodiment can be considered as a case where x = f (x) in the present embodiment.
[0075]
FIG. 13 shows a block diagram of a speech coding system according to the fifth embodiment of the present invention. In this speech coding system, the pulse position candidate search unit of the first embodiment is configured by a pitch filter inverse operation unit 174, a smoothing unit 175, and a position candidate calculation unit 173. It has the same configuration.
[0076]
Next, the processing application of the present embodiment will be described. As in the first embodiment, after the LPC analysis and LPC quantization and the search of the adaptive codebook 141 are completed, the pitch vector becomes the pulse position candidate search. It is passed to the pitch filter inverse operation unit 174 of the unit 142. The pitch filter inverse operation unit 174 performs an operation representing an inverse characteristic of the pitch cycle emphasis unit 160. For example, the transfer function P (Z) of the pitch filter is
P (z) = 1−az ＾ (− L) (1)
When the transfer function Q (z) is obtained by the pitch filter inverse operation unit 174,
Q (Z) = 1 / (1-baz ＾ (− L)) (2)
And a method using a filter given by Here, a is a constant, b represents the degree of the inverse characteristic, and when b = 1, Q (z) is an inverse filter of P (z). The input pitch vector is output after being subjected to an inverse operation, and the power envelope is obtained by the smoothing unit 175 in the same manner as the pitch vector smoothing unit 171 of the fourth embodiment. The position candidate calculation unit 173 selects a pulse position candidate according to the power envelope, and creates an adaptive algebraic structure codebook 143. Subsequent processing is the same as in the first embodiment.
[0077]
A feature of the present embodiment is that a pitch vector in consideration of the influence of the pitch period emphasis unit 160 is used for adaptation of a pulse position candidate. The reason why the efficiency is improved by doing this will be described.
[0078]
The noise vector generated from the adaptive algebraic structure codebook is pitch-periodized by the pitch period emphasis unit 160. When the equation (1) is used for the periodization, the pulse near the head of the subframe is repeated many times within the subframe at the pitch cycle interval, whereas the number of repetitions decreases in the latter half of the pulse. By observing the actually obtained noise code vector, it can be confirmed that a pulse is likely to be generated at a position closer to the head when a strong pitch filter is used. This indicates that the pulse position is closely related not only to the pitch vector shape but also to the pitch filter. In the present embodiment, the use of the pitch filter inverse operation unit 174 realizes the adaptation of the pulse position candidate in consideration of the influence of the pitch period emphasis unit 160.
[0079]
By the way, in the third embodiment, it is possible to apply two types of filters, a pulse shaping filter and a pitch filter, to the noise vector. When the present embodiment is applied to such a case, it is ideal that a characteristic obtained by combining two filters is obtained, and an inverse characteristic of the characteristic is used for the pitch filter inverse operation unit. However, the effect can be obtained only by using only the characteristics of the pitch filter which has a large influence because the processing amount increases. Further, the order of the pitch filter inverse operation unit 174 and the smoothing unit 175 can be reversed.
[0080]
FIG. 14 shows the configuration of the speech decoding system of the present embodiment corresponding to the speech encoding system of FIG. Since the operation of the speech encoding system is obvious from the operation of the speech decoding system described in the first to fourth embodiments, detailed description will be omitted.
[0081]
FIG. 15 shows a block diagram of a speech coding system according to the sixth embodiment of the present invention. This speech coding system has the same configuration as that of the first embodiment, except that the adaptive algebraic structure codebook of the first embodiment is replaced by a noise vector generator 180 and an amplitude codebook 181.
[0082]
Next, the processing procedure of this embodiment will be described. First, after the LPC analysis and LPC quantization and the search of the adaptive codebook 141 are completed, the pitch vector is transmitted to the pulse position search unit 174 as in the first embodiment. Passed. The pulse position search unit 174 obtains a pulse position based on the power envelope of the pitch vector in the same manner as in the first embodiment, and outputs this to the noise vector generation unit. Here, the present embodiment is different from the previous embodiments in that the noise vector search unit generates all the pulses at the position obtained by the pulse position search unit 174. That is, in the embodiments described above, pulse position candidates are obtained, and the optimum pulse position is selected from the candidate in the adaptive algebraic structure codebook. On the other hand, in the present embodiment, all the pulse position candidates are used simultaneously. . Therefore, the process of selecting the pulse position becomes unnecessary. Instead, a process of selecting the amplitude of each pulse from the amplitude codebook 181 is added. As the output signal, information D representing the amplitude of the pulse is output instead of the information c representing the pulse position.
[0083]
A method of generating a noise vector will be described in detail with reference to FIG. FIG. 16A shows an amplitude pattern obtained from the amplitude codebook by an arrow. In this case, it is assumed that seven pulses are made. The waveforms in FIGS. 16B and 16C are the pitch vector power envelope obtained by the pulse position search unit 174 and the corresponding pulse positions (indicated by ○ in the figure). In FIG. 16 (b), there are two peaks in the power, so that seven pulse positions are dispersed in two places. In FIG. 16 (c), there is one peak in the center, so that the pulse position is focusing. FIGS. 16D and 16E are noise vectors in which a pulse having the amplitude of FIG. 16A is set at each pulse position. It can be seen that the shape of the drive signal also changes according to the pitch vector power envelope. As described above, since it is not necessary to transmit the power envelope information of the pitch vector, in the present embodiment, the shape of the noise vector can be approximated to the ideal noise vector shape without increasing the bit rate.
[0084]
In the present embodiment, as the bit rate increases, more pulse amplitude information D can be sent, and the quality is improved, but the degree of improvement is reduced. At a somewhat high bit rate, performance may be improved by including noise vectors with pulses at unselected positions as search candidates rather than increasing amplitude information. Specifically, the pulse position search unit 174 outputs a pattern (pulse pattern) of a different pulse position, and the noise vector generation unit searches for an amplitude for each pulse pattern. As the pulse pattern, in addition to the pulse pattern adapted to the above-described pitch vector, a pulse pattern generated from a pulse position not selected as the pulse pattern is prepared. For example, there is a method in which the remainder obtained by subtracting the sample position selected by the adaptation from all the sample positions of the subframe is used as a second pulse pattern to search for the amplitude of two types of pulse patterns. The number of bits assigned to the amplitude information may be different for each pulse pattern, and it is generally more efficient to allocate more bits to the pulse pattern using the adaptation. When a plurality of pulse patterns are used, it is necessary to transmit information indicating which pulse pattern was used in the information D, and the amplitude information is reduced accordingly, but only a single pulse pattern is searched. Better quality.
[0085]
FIG. 17 shows the configuration of the speech decoding system of the present embodiment corresponding to the speech encoding system of FIG. Since the operation of the speech decoding system is obvious from the operation of the speech decoding system described in the first to fifth embodiments, detailed description will be omitted.
[0086]
In the above embodiment, the speech encoding / decoding method has been described. However, the present invention can also be applied to a speech synthesis method. In that case, in the speech decoding system shown in FIGS. 5, 7, and 9, What is necessary is just to give each index based on the reproduced audio signal to be synthesized.
[0087]
【The invention's effect】
As described above, according to the present invention, high-quality speech encoding / decoding can be performed even when using an algebraic structure codebook in which the pulse position and the number of pulses are reduced by reducing the encoding rate.
[Brief description of the drawings]
FIG. 1 is a block diagram of a speech encoding system according to a first embodiment of the present invention.
FIG. 2 is a flowchart illustrating a pulse position candidate selection procedure according to the first embodiment;
FIG. 3 is a diagram showing a state of processing in each step of FIG. 2;
FIG. 4 is a diagram illustrating a relationship between a power envelope of a pitch vector and a pulse position candidate according to the first embodiment.
FIG. 5 is a block diagram of a speech decoding system according to the first embodiment;
FIG. 6 is a block diagram of a speech encoding system according to a second embodiment of the present invention.
FIG. 7 is a block diagram of a speech decoding system according to a second embodiment;
FIG. 8 is a block diagram of a speech encoding system according to a third embodiment of the present invention.
FIG. 9 is a block diagram of a speech decoding system according to a third embodiment.
FIG. 10 is a block diagram of a speech coding system according to a fourth embodiment of the present invention.
FIG. 11 is a diagram showing the relationship between the values of the pitch vector power envelope, the position candidate density function, the power envelope, and the position candidate density function.
FIG. 12 is a block diagram of a decoding system according to a fourth embodiment;
FIG. 13 is a block diagram of a speech encoding system according to a fifth embodiment of the present invention.
FIG. 14 is a block diagram of a decoding system according to a fifth embodiment;
FIG. 15 is a block diagram of a speech encoding system according to a sixth embodiment of the present invention.
FIG. 16 is a diagram for explaining a noise vector generation method.
FIG. 17 is a block diagram of a decoding system according to a sixth embodiment;
[Explanation of symbols]
101 ... Audio input terminal
102, 103 ... gain multiplication unit
104, 105 ... addition section
110 ... LPC analysis unit
111 LPC quantizer
120 ... LPC synthesis unit
130 ... auditory weighting unit
141 ... Adaptive codebook
142... Pulse position candidate search unit
143 ... Adaptive algebraic structure codebook
144: Noise codebook
150 ... code selection unit
160 pitch pitch emphasis unit
161: pulse shaping filter analyzer
162 ... Pulse shaping unit
171: pitch vector smoothing unit
172 position candidate density function calculator
173: position candidate calculation unit
174: pulse position search unit
180 ... Noise vector generation unit
181 ... amplitude codebook

Claims

Generating synthesis filter information based on the input audio signal in frame units; generating a pitch vector from drive signals stored in the adaptive codebook for each subframe obtained by dividing the frame; Generating a pulse train by arranging pulses at a predetermined number of pulse positions selected from pulse position candidates arranged such that there are more candidates as large as possible, and a pitch vector of the adaptive codebook. A speech encoding method comprising: synthesizing the pulse train to generate a new drive signal; and generating a synthesized speech from the synthesis filter information and the new drive signal.

2. The speech encoding method according to claim 1, further comprising the step of giving a periodicity of a pitch period to said pulse train.

The speech encoding method according to claim 1, wherein the driving signal generating step includes a step of multiplying the pitch vector and the pulse train by a gain.

4. The speech encoding method according to claim 3, further comprising: multiplexing an index indicating the synthesis filter information, an index indicating the pitch vector, an index indicating the pulse train, and an index indicating the gain to generate a bit stream. .

The speech encoding method according to claim 1, further comprising a step of pulse-shaping the pulse train according to a filter coefficient determined based on the pitch vector.

The pulse train generating step includes a step of obtaining a power envelope of the pitch vector, a step of converting the power envelope into a position candidate density function, and a step of calculating the pulse position candidate using the position candidate density function. The speech encoding method according to claim 1.

Reproducing the synthesis filter information from the speech coded information; generating a pitch vector from the adaptive codebook based on the coded information; and Generating a pulse train by arranging pulses at a predetermined number of pulse positions selected from pulse position candidates arranged such that there is a new drive signal by synthesizing the pitch vector and the pulse train An audio decoding method including: generating; and generating a reproduced audio signal from the synthesis filter information and the new drive signal.

8. The speech decoding method according to claim 7, further comprising a step of pulse-shaping the pulse train according to a filter coefficient determined based on the pitch vector.