JPWO2003071522A1

JPWO2003071522A1 - Method for generating fixed excitation vector and fixed excitation codebook

Info

Publication number: JPWO2003071522A1
Application number: JP2003570338A
Authority: JP
Inventors: 江原　宏幸; 宏幸江原; 和敏安永; 間野　一則; 一則間野; 祐介日和▲崎▼
Original assignee: Panasonic Corp; Nippon Telegraph and Telephone Corp; Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Corp; Nippon Telegraph and Telephone Corp; Panasonic Holdings Corp
Priority date: 2002-02-20
Filing date: 2003-02-20
Publication date: 2005-06-16
Anticipated expiration: 2023-02-20
Also published as: AU2003211229A1; WO2003071522A1; US7580834B2; JP4299676B2; US20050228652A1

Abstract

音声符号化側において、固定音源ベクトルの生成に関して、パルス音源ベクトル形状判定器３０２にてパルス音源符号帳３０１から出力された音源ベクトルの形状を判定し、その形状の音源ベクトルに適用される拡散ベクトルを、拡散ベクトル格納器３０４から出力し、拡散ベクトル畳込み処理器３０３にて音源ベクトルに拡散ベクトルを畳込み処理を行う。特に、パルス音源符号帳３０１から使用頻度が高い特定の形状を有するパルス音源ベクトルが出力された場合、パルス音源ベクトル形状判定器３０２は、当該パルス音源ベクトル専用に用意された追加拡散ベクトルを出力するように拡散ベクトル格納器３０４を制御する。これにより、復元音声の品質を向上させ、ユーザーにとってより自然で聞きやすい音声を復元できる技術を提供することができる。On the speech encoding side, regarding generation of a fixed excitation vector, a pulse excitation vector shape determiner 302 determines the shape of the excitation vector output from the pulse excitation codebook 301, and a diffusion vector applied to the excitation vector of that shape Is output from the diffusion vector storage 304, and the diffusion vector convolution processor 303 performs convolution processing of the diffusion vector on the sound source vector. In particular, when a pulse excitation vector having a specific shape that is frequently used is output from the pulse excitation codebook 301, the pulse excitation vector shape determiner 302 outputs an additional diffusion vector prepared exclusively for the pulse excitation vector. The diffusion vector store 304 is controlled as follows. As a result, it is possible to provide a technology capable of improving the quality of the restored voice and restoring the voice that is more natural and easy to hear for the user.

Description

技術分野
本発明は、ＣＥＬＰ型音声符号化装置あるいはＣＥＬＰ型音声復号化装置に用いられる固定音源ベクトルの生成方法及び固定音源符号帳に関する。
背景技術
ディジタル移動通信や、インターネット通信に代表されるパケット通信、あるいは音声蓄積などの分野においては、電波などの伝送路容量や記憶媒体の有効利用のために音声情報を圧縮し、高能率で符号化するための音声符号化装置が用いられている。
中でもＣＥＬＰ（ＣｏｄｅＥｘｃｉｔｅｄＬｉｎｅａｒＰｒｅｄｉｃｔｉｏｎ）方式をベースにした方式が中・低ビットレートにおいて広く実用化されている。パルス音源を駆動音源信号として用いるＣＥＬＰの技術については、Ｍ．Ｒ．ＳｃｈｒｏｃｄｅｒａｎｄＢ．Ｓ．Ａｔａｌ：”Ｃｏｄｅ−ＥｘｃｉｔｅｄＬｉｎｅａｒＰｒｅｄｉｃｔｉｏｎ（ＣＥＬＰ）：Ｈｉｇｈ−ｑｕａｌｉｔｙＳｐｅｅｃｈａｔＶｅｒｙＬｏｗＢｉｔＲａｔｅｓ”，Ｐｒｏｃ．ＩＣＡＳＳＰ−８５，２５．１．１，ｐｐ．９３７−９４０，１９８５”に示されている。
ＣＥＬＰ型音声符号化方式は、ディジタル化された音声信号を一定のフレーム長（５ｍｓ〜５０ｍｓ程度）に区切り、フレーム毎に音声の線形予測を行い、フレーム毎の線形予測による予測残差（励振信号）を、既知の波形からなる適応符号帳と雑音（固定）符号帳とを用いて符号化するものである。
適応符号帳は、過去に生成した駆動音源信号を格納しており、音声信号の周期成分を表現するために用いられる。固定符号帳は予め用意された定められた数の定められた形状を有するベクトルを格納しており、適応符号帳では表現できない非周期的成分を主として表現するために用いられる。
固定符号帳に格納されるベクトルとしては、ランダムな雑音系列から成るベクトルや、何本かのパルスの組み合わせによって表現されるベクトルなどが用いられる。
数本のパルスの組み合わせによってベクトルを表現する固定符号帳の代表的なものの一つに代数的固定符号帳がある。代数的固定符号帳については「ＩＴＵ−Ｔ勧告Ｇ．７２９」などに具体的内容が示されている。代数的固定符号帳は、少ない演算量で固定音源符号帳を探索でき、また、音源ベクトルを格納しておくＲＯＭの容量を減らすことができるといったメリットがある。しかし、その一方で、雑音成分の忠実な符号表現が困難であるという問題点もある。
この代数的固定符号帳の問題点を解決する方法の一つとして、パルス拡散符号帳を用いる技術がある。パルス拡散については、「ＩＴＵ−Ｔ勧告Ｇ．７２９Ａｎｎｅｘ−Ｄ」等に開示されている。このパルス拡散は、音源ベクトルに、拡散パタン（固定波形）を畳み込んで固定音源ベクトルを生成する方法である。
図１は、従来のパルス拡散構造を有する固定音源符号帳の構成の一例を示すブロック図である。パルス拡散符号帳１０は、パルス音源符号帳１１と、拡散ベクトル畳込み処理器１２と、拡散ベクトル格納器１３とを具備する。
パルス音源符号帳１１からパルス音源ベクトルが出力され、このパルス音源ベクトルに対して、拡散ベクトル格納器１３から取り出された拡散ベクトルが拡散ベクトル畳込み処理器１２において畳み込まれ、これにより、固定音源ベクトル（雑音音源ベクトル）が生成される。
従来のパルス拡散によって、例えば４ｋｂｉｔ／ｓ以下のような低ビットレートにおけるパルス音源符号帳の性能を改善することが可能である。
しかし、例えば、次世代の携帯電話システムでは、さらに大きな品質改善（すなわち、復元音声の品質をさらに向上させること）が求められており、既存の技術では、この要求を満足させることが困難である。
例えば、拡散ベクトルのパターンを単純に増大させても、その分だけ復元音声の品質が改善されるというものではないし、また、拡散ベクトルのパターンの増大は、メモリ容量の増大や信号処理の煩雑化を招く恐れがある。
発明の開示
本発明の目的は、音声の符号化側または復号化側において音声品質の改善を図って復元音声の品質をさらに向上させ、ユーザーにとってより自然で聞きやすい音声を復元することができる技術を提供することである。
この目的は、音声符号化側において、固定音源ベクトルの生成に関して、多数のパルス音源ベクトルの中から、例えば、使用頻度が高い特定の形状を有するパルス音源ベクトルを予め選び、選んだパルス音源ベクトルに対応する専用の拡散ベクトルを用意することにより達成される。
また、音声復号化側において、合成フィルタ（人間の声道を模した機能をもつ）に入力される前の、音源信号（人間の声帯で発せられる音声を模した信号）について、例えば、従来にない工夫された特性の高域強調処理を施すことにより達成される。
発明を実施するための最良の形態
以下、本発明の実施の形態について、図面を用いて説明する。
まず、本発明における音声信号送信装置および音声信号受信装置の全体構成の概略を、図２を用いて説明する。
図２において、音声信号１０１は入力装置１０２によって電気的信号に変換されＡ／Ｄ変換装置１０３に出力される。Ａ／Ｄ変換装置１０３は入力装置１０２から出力された（アナログ）信号をディジタル信号に変換し音声符号化装置１０４へ出力する。音声符号化装置１０４はＡ／Ｄ変換装置１０３から出力されたディジタル音声信号を後述する音声符号化方法を用いて符号化し符号化情報をＲＦ変調装置１０５へ出力する。ＲＦ変調装置１０５は音声符号化装置１０４から出力された音声符号化情報を電波等の伝播媒体に載せて送出するための信号に変換し送信アンテナ１０６へ出力する。送信アンテナ１０６はＲＦ変調装置１０５から出力された出力信号を電波（ＲＦ信号）として送出する。なお、図中のＲＦ信号１０７は送信アンテナ１０６から送出された電波（ＲＦ信号）を表す。以上が音声信号送信装置の構成および動作である。
ＲＦ信号１０８は受信アンテナ１０９によって受信されＲＦ復調装置１１０へ出力される。なお、図中のＲＦ信号１０８は受信アンテナ１０９に受信された電波を表し、伝播路において信号の減衰や雑音の重畳がなければＲＦ信号１０７と全く同じ物となる。
ＲＦ復調装置１１０は受信アンテナ１０９から出力されたＲＦ信号から音声符号化情報を復調し音声復号化装置１１１へ出力する。音声復号化装置１１１はＲＦ復調装置１１０から出力された音声符号化情報から後述する音声復号化方法を用いて音声信号を復号しＤ／Ａ変換装置１１２へ出力する。Ｄ／Ａ変換装置１１２は音声復号化装置１１１から出力されたディジタル音声信号をアナログの電気的信号に変換し出力装置１１３へ出力する。
出力装置１１３は電気的信号を空気の振動に変換し音波として人間の耳に聴こえるように出力する。なお、図中、参照符号１１４は出力された音波を表す。以上が音声信号受信装置の構成および動作である。
上記のような音声信号送信装置および受信装置の少なくとも一方を備えることにより、移動通信システムにおける基地局装置および移動端末装置を構成することができる。
以下、音声符号化側における、拡散ベクトルを用いた固定音源ベクトルの生成の改善（実施の形態１）と、音声復号化側における高域強調処理（実施の形態２）について、順次、図面を参照して具体的に説明する。
（実施の形態１）
実施の形態１では、固定音源符号帳において、予め定められた形状のパルス音源ベクトルに使用される専用の拡散ベクトル（以下、「追加拡散ベクトル」という）を用意し、パルス音源ベクトルの形状に応じて最適な拡散ベクトルを適用する場合について説明する。
図３は、図２の音声信号送信装置に搭載されている音声符号化装置１０４の構成を示すブロック図である。
音声符号化装置１０４の入力信号は、Ａ／Ｄ変換装置１０３から出力される信号であり、前処理部２００に入力される。前処理部２００は、ＤＣ成分を取り除くハイパスフィルタ処理や後続する符号化処理の性能改善につながるような波形整形処理やプリエンファシス処理を行い、これらの処理後の信号（Ｘｉｎ）をＬＰＣ分析部２０１および加算器２０４に出力する。
ＬＰＣ分析部２０１は、Ｘｉｎを用いて線形予測分析を行い、分析結果（線形予測係数）をＬＰＣ量子化部２０２へ出力する。ＬＰＣ量子化部２０２は、ＬＰＣ分析部２０１から出力された線形予測係数（ＬＰＣ）の量子化処理を行い、量子化ＬＰＣを合成フィルタ２０３へ出力するとともに量子化ＬＰＣを表す符号Ｌを多重化部２１３へ出力する。
合成フィルタ２０３は、量子化ＬＰＣに基づくフィルタ係数により、後述する加算器２１０から出力される駆動音源に対してフィルタ合成を行うことにより合成信号を生成し、合成信号を加算器２０４へ出力する。
加算器２０４は前記Ｘｉｎと前記合成信号との誤差信号を算出し、聴覚重み付け部２１１へ出力する。聴覚重み付け部２１１は、加算器２０４から出力された誤差信号に対して聴覚的な重み付けをおこない、聴覚重み付け領域での前記Ｘｉｎと前記合成信号との歪みを算出し、パラメータ決定部２１２へ出力する。
パラメータ決定部２１２は、聴覚重み付け部２１１から出力された前記符号化歪みを最小とする適応音源ベクトル、固定音源ベクトル及び量子化利得を、各々適応音源符号帳２０５、固定音源符号帳２０７及び量子化利得生成部２０６から選択し、選択結果を示す適応音源ベクトル符号（Ａ）、音源利得符号（Ｇ）及び固定音源ベクトル符号（Ｆ）を多重化部２１３に出力する。また、パラメータ決定部２１２は、固定音源符号帳２０７で選択されたパルス音源ベクトルの形状が予め設定された特定の形状のものである場合、当該ベクトル専用に用意された追加拡散ベクトルのセットの中から基本拡散ベクトルよりも量子化誤差を小さくする拡散ベクトルがあるかを調べ、最も量子化誤差を小さくする拡散ベクトルを基本拡散ベクトルと追加拡散ベクトルの中から選択し、選択結果を示す制御信号を固定音源符号帳２０７に出力する。
適応音源符号帳２０５は、過去に加算器２１０によって出力された駆動音源信号をバッファリングしており、パラメータ決定部２１２から出力された信号によって特定される過去の駆動音源信号サンプルから１フレーム分のサンプルを適応音源ベクトルとして切り出して乗算器２０８へ出力する。
量子化利得生成部２０６は、パラメータ決定部２１２から出力された信号によって特定される適応音源利得と固定音源利得とをそれぞれ乗算器２０８と２０９へ出力する。
固定音源符号帳２０７は、パラメータ決定部２１２から出力された信号によって特定される形状を有するパルス音源ベクトルに拡散ベクトルを乗算して得られた固定音源ベクトルを乗算器２０９へ出力する。この固定音源符号帳２０７の構成が本実施の形態の特徴的な部分であり、この特徴部分については、後に、具体的に説明する。
乗算器２０８は、量子化利得生成部２０６から出力された量子化適応音源利得を、適応音源符号帳２０５から出力された適応音源ベクトルに乗じて、加算器２１０へ出力する。
乗算器２０９は、量子化利得生成部２０６から出力された量子化固定音源利得を、固定音源符号帳２０７から出力された固定音源ベクトルに乗じて、加算器２１０へ出力する。
加算器２１０は、利得乗算後の適応音源ベクトルと固定音源ベクトルとをそれぞれ乗算器２０８と２０９から入力し、これらをベクトル加算し、加算結果である駆動音源を合成フィルタ２０３および適応音源符号帳２０５へ出力する。
多重化部２１３は、ＬＰＣ量子化部２０２から量子化ＬＰＣを表す符号（Ｌ）を、パラメータ決定部２１２から適応音源ベクトルを表す符号（Ａ）、固定音源ベクトルを表す符号（Ｆ）および量子化利得を表す符号（Ｇ）を、それぞれ入力し、これらの情報を多重化して符号化情報として伝送路へ出力する。
以上が音声符号化装置１０４の各構成部分の説明である。
次に、固定音源符号帳２０７の具体的構成及び特徴について図面を用いて説明する。
図４は、図３の固定音源符号帳２０７の構成を示すブロック図である。
図４において、パルス音源符号帳３０１はパルス音源ベクトルをパルス音源ベクトル形状判定器３０２および拡散ベクトル畳込み処理器３０３にそれぞれ出力する。
パルス音源ベクトル形状判定器３０２は、予め定められたベクトル形状をこのベクトル形状を特定するパラメータと関連付けてメモリに記憶する。ここで、パルス音源ベクトルが数本のパルスのみから構成される場合、これらの形状は、パルス間距離（何サンプル離れているか）とパルスの極性関係（異極性か同極性か）によって特定される。この場合、パルス間距離とパルスの極性関係がパラメータとなる。
そして、パルス音源ベクトル形状判定器３０２は、ベクトル形状パルス音源符号帳３０１から出力されたパルス音源ベクトルのパラメータと、記憶する各ベクトル形状のパラメータとを比較し、例えば、全てのパラメータが一致した場合、それらのベクトルは同一形状であると判定する。パルス音源ベクトルが数本のパルスのみから構成される場合、パルス音源ベクトル形状判定器３０２は、各パルス間の相対的な位置および極性の関係が同じであれば、それらのベクトルは同一形状であると判定する。なお、同じパルス間隔で同じパルス極性を有したベクトルを時間軸方向にシフトしたものやベクトルの大きさ（パルスの振幅）を定数倍したものなども同一形状のベクトルと判定する。
パルス音源ベクトル形状判定器３０２は、同一形状のベクトルが存在した場合、その形状のパルス音源ベクトル専用に設計した追加拡散ベクトルを出力するように拡散ベクトル格納器３０４へ制御信号を出力する。一方、パルス音源ベクトル形状判定器３０２は、同一形状のベクトルが存在しなかった場合、基本拡散ベクトルを出力するように拡散ベクトル格納器３０４へ制御信号を出力する。
拡散ベクトル格納器３０４は、すべてのパルス音源ベクトルに対して共通に使用される基本拡散ベクトルの他に、予め定められた形状のパルス音源ベクトルに使用される追加拡散ベクトルをメモリに記憶し、パラメータ決定部２１２からの制御信号及びパルス音源ベクトル形状判定器３０２からの制御信号によって、拡散ベクトル畳込み処理器３０３へ出力する拡散ベクトルを切り替える。すなわち、拡散ベクトル格納器３０４は、固定音源ベクトル形状判定器３０２によって判定されたパルス音源ベクトル形状に対応する拡散ベクトルを選択し、拡散ベクトル畳込み処理器３０３へ出力する。
拡散ベクトル畳込み処理器３０３は、パルス音源符号帳３０１から出力されたパルス音源ベクトルに対して、拡散ベクトル格納器３０４から取り出された拡散ベクトルを畳み込む。これにより、固定音源ベクトル（雑音音源ベクトル）が生成される。
このように、音源ベクトルの形状に応じて最適な拡散ベクトルの形状を選択し、これを畳み込むことにより、所定の拡散ベクトル（１種類もしくは複数種類の基本拡散ベクトル）を全てのパルス音源ベクトルに適用する場合に比べて符号化性能を改善することができる。
ここで、パルス音源ベクトル形状判定器３０２のメモリに記憶させるベクトル形状は何種類であっても良いが、使用頻度の高い特定形状の音源ベクトルについてのみ追加拡散ベクトルを用意することにより、追加拡散ベクトルの数を絞込み、追加拡散ベクトルを導入することにより生じるＲＯＭ容量の増加を抑えることができる。
以下、パルス音源ベクトル形状判定器３０２のメモリに先験的に記憶させる使用頻度の高い特定形状の音源ベクトルの選定方法、及び、これに適用する追加拡散ベクトルの選定方法について具体的に説明する。
図５Ａ、図５Ｂは、パルス音源符号帳３０１から出力されるパルス音源ベクトル（２本のパルスの場合）についての、各パルス間の距離と各パルスの極性をパラメータとした場合の使用頻度の分布を示す図であり、数時間の音声データを実際に符号化して集計したものである。図５Ｂは、図５Ａを横軸方向に拡大した図であり、図５Ａ、図５Ｂの横軸はパルス間距離（サンプル）を、縦軸はそのパルス間距離を有する音源ベクトルが使用された正規化使用頻度をそれぞれ示す。また、図５Ａ、図５Ｂにおいて、原点は２パルスが重なり、１パルスの音源ベクトルであることを示し、原点の左側は異極性のパルスの組み合わせであることを、右側は同極性の組み合わせであることを、それぞれ表す。
なお、正規化使用頻度とは、各間隔のパルス音源ベクトルが使用された回数を各間隔のパルスの組み合わせ数で割った値であり、例えば、間隔が１サンプルの場合、第１パルスが１サンプルで第２パルスが２サンプル、同２サンプルと同３サンプル、など複数の組み合わせが存在する場合はパルス音源符号帳が生成しうる全ての組み合わせ数で正規化した頻度をいう。
図５Ａ、図５Ｂから明らかように、使用頻度は、極性の組み合わせによらず、２パルス間の距離が２サンプル以内である音源ベクトルに集中する。
そこで、２パルス間の距離が２サンプル以内の音源ベクトル５種類（パルス間距離０、パルス間距離１で同極性パルス、パルス間距離１で異極性パルス、パルス間距離２で同極性パルス、パルス間距離２で異極性パルス）をパルス音源ベクトル形状判定器３０２のメモリに記憶させるものとして選定する。
次に、選定した各音源ベクトルについて、それぞれ専用の追加拡散ベクトルを学習によって設計する。
なお、拡散ベクトルの学習は、例えばＫ．Ｙａｓｕｎａｇａｅｔａｌ，“Ｄｉｓｐｅｒｓｅｄ−ｐｕｌｓｅｃｏｄｅｂｏｏｋａｎｄｉｔｓａｐｐｌｉｃａｔｉｏｎｔｏａ４ｋｂ／ｓｓｐｅｅｃｈｃｏｄｅｒ，”Ｐｒｏｃ．ＩＣＡＳＳＰ２０００，ｐｐ．１５０３−１５０６，２０００の３．１節に示されているように、一般化Ｌｌｏｙｄアルゴリズムに基づいて行い、学習データに対する符号化歪の総和を最小化する拡散ベクトルを決定する。
図６〜図１０は、設計された追加拡散ベクトルの一例を示す図で、各音源ベクトルに対して４種類ずつ追加拡散ベクトルを設計した例である。
図６は、パルス間距離が２サンプルでパルス極性が同極性である音源ベクトルについて、専用の拡散ベクトル４種類（Ａ１〜Ａ４）を割り当てていることを示している。同様に、図７は、パルス間距離が１サンプルで、パルス極性が同極性の音源ベクトルについて、４種類（Ｂ１〜Ｂ４）の追加拡散ベクトルが設けられていることを示す。以下同様に、図８、図９、図１０は、それぞれ、パルス間距離０サンプルで同極性、パルス間距離１サンプルで異極性、パルス間距離２サンプルで異極性の音源ベクトルについて、４種類ずつの追加拡散ベクトルが設けられていることを示す。図６〜図１０より明らかなように、５種類のパルス音源ベクトルに対して得られた追加拡散ベクトルの形状は互いに異なる特徴を有する。
なお、全ての音源ベクトルに対して共通の拡散ベクトルを用いて学習を行うと、これら異なる特徴を有する拡散ベクトルの平均的な形状を有するベクトルが得られてしまうので、性能改善にも限界がある。基本拡散ベクトルの一例を図１１に示す。
また、図６〜図１０では、各音源ベクトルについて、４種類の追加拡散ベクトルを割り当てることを前提として説明しているが、本発明はこれに限定されるものではない。例えば、図６〜図１０に示される追加拡散ベクトルの数（種類）は１種類であっても良い。
また、図には示さないが、パルスが３本の場合でも、使用頻度が高い特定形状の音源ベクトル毎に別々の追加拡散ベクトルを設ける。
図１２は、追加拡散ベクトルが図６〜図１０に示したものである場合の拡散ベクトル格納器３０４の選択処理の内容を具体的に説明するための図である。
拡散ベクトル格納器３０４は、図１２に示すように、複数の拡散ベクトルサブセット４００〜４０５を備える。
拡散ベクトルサブセット４００は、基本拡散ベクトルを出力する端子Ｘ０を備え、スイッチ４０６を介して基本拡散ベクトルを拡散ベクトル畳込み処理器３０３に出力する。
拡散ベクトルサブセット４０１は、図６に示した４つの追加拡散ベクトルを出力する端子Ａ１〜Ａ４と基本拡散ベクトルを出力する端子Ａ０とを備え、５種類の拡散ベクトルＡ０〜Ａ４の中からパラメータ決定部２１２によって決定された拡散ベクトルをスイッチ４０７で１つ選び、スイッチ４０６を介して拡散ベクトル畳込み処理器３０３に出力する。
同様に、拡散ベクトルサブセット４０２〜４０５は、それぞれ、図７〜図１０に示した４つの追加拡散ベクトルを出力する端子Ｂ１〜Ｂ４、Ｃ１〜Ｃ４、Ｄ１〜Ｄ４、Ｅ１〜Ｅ４と基本拡散ベクトルを出力する端子Ｂ０、Ｃ０、Ｄ０、Ｅ０とを備え、パラメータ決定部２１２によって決定された拡散ベクトルをスイッチ４０８、４０９、４１０、４１１で１つ選び、スイッチ４０６を介して拡散ベクトル畳込み処理器３０３に出力する。
なお、図１２において、端子Ｘ０、Ａ０、Ｂ０、Ｃ０、Ｄ０、Ｅ０から出力される基本ベクトルは同一のものである。
拡散ベクトルサブセット４００〜４０５の切替えを行うスイッチ４０６は、パルス音源符号帳３０１から出力されてくるパルス音源ベクトルの形状によって、パルス音源ベクトル形状判定器３０２の制御に基づいて切り替わる。即ち使用頻度の高い特定の形状のパルス音源ベクトルがパルス音源符号帳３０１からパルス音源ベクトル形状判定器３０２へ入力されると、その形状のパルス音源ベクトルに対応する拡散ベクトルサブセット４０１〜４０５の出力端子にスイッチ４０６が接続される。なお、特定の形状ではないパルス音源ベクトルがパルス音源符号帳３０１からパルス音源ベクトル形状判定器３０２へ入力されると、拡散ベクトルサブセット４００の出力端子にスイッチ４０６が接続される。
スイッチ４０７〜４１１は、各拡散ベクトルサブセット４０１〜４０５に具備された５種類の拡散ベクトルの中からパラメータ決定部２１２によって決定された拡散ベクトルを出力する端子に接続する。
以上の構成により、パルス音源ベクトル形状判定器３０２に記憶されたものと同一の音源ベクトルが固定音源符号帳３０１から出力された場合は、４種類の追加拡散ベクトルと基本拡散ベクトルの５種類の中から最適なものが１つ選ばれる。
なお、図１２では、追加拡散ベクトルを備えた拡散ベクトルサブセットは５つであるが、本発明では拡散ベクトルサブセットの数に制限はなく、使用頻度の高いパルス音源ベクトルのパターン数に応じて適宜増減させることができる。また、各拡散ベクトルサブセットに備えられている追加拡散ベクトルは４種類であるが、本発明では追加拡散ベクトルの数に制限はない。
以上説明した処理の重要な部分の手順を図１３に示す。図１３は、図４に示した固定音源符号帳探索の処理フローを示すフローチャートである。
まず、ＳＴ５０１で基本拡散ベクトルを用いたパルス音源探索が行われる。基本拡散ベクトルにインパルス（即ち拡散なし）を用いても良い。具体的な探索方法は、例えば、特開平１０−６３３００号公報（第１７段落（従来技術）および第５１〜５４段落）、Ｋ．Ｙａｓｕｎａｇａｅｔａｌ，“Ｄｉｓｐｅｒｓｅｄ−ｐｕｌｓｅｃｏｄｅｂｏｏｋａｎｄｉｔｓａｐｐｌｉｃａｔｉｏｎｔｏａ４ｋｂ／ｓｓｐｅｅｃｈｃｏｄｅｒ，”Ｐｒｏｃ．ＩＣＡＳＳＰ２０００，ｐｐ．１５０３−１５０６，２０００の２．２節に開示されている。
次に、ＳＴ５０２においてＳＴ５０１にて選択されたパルス音源ベクトルが予め定められた特定の形状のパラメータ（パルス位置、極性の組み合わせ）を有しているかどうかをチェックする。
これらの特定の形状とは、パルス音源符号帳から生成されるパルス音源ベクトルのうち、固定音源ベクトルとして使用される（探索の結果選択される）頻度が高いベクトルの形状のことを指す。
すなわち、より具体的には、例えば２パルス音源では、パルス間距離が１サンプル（例えば１１サンプル目と１２サンプル目に音源パルスが立てられている）でパルス極性が異符号である形状や、パルス間距離が２サンプル（例えば２０サンプル目と２２サンプル目に音源パルスが立てられている）でパルス極性が同符号である形状等が使用頻度の高いベクトルである。
このような特定の形状を有する音源ベクトルではない場合はＳＴ５０１で選択されたパルス音源ベクトルに基本拡散ベクトルを畳み込んだものを固定音源ベクトルとして使用する。
即ち図１２のスイッチ４０６は拡散ベクトルサブセット４００の端子Ｘ０に接続される。もし、ＳＴ５０１で選択されたパルス音源ベクトルが、特定の形状を有するベクトルである場合は、ＳＴ５０３へ進む。
ＳＴ５０３では、特定の形状を有するベクトル専用に用意された拡散ベクトルサブセット（図１２の拡散ベクトルサブセット４０１〜４０５）の追加拡散ベクトルの中から基本拡散ベクトルよりも量子化誤差を小さくする拡散ベクトルがあるかを調べ、最も量子化誤差を小さくする拡散ベクトルを基本拡散ベクトルと追加拡散ベクトルの中から選択する。なお、どの追加拡散ベクトルを含む拡散ベクトルサブセットを用いるかはパルス音源ベクトル形状判定器３０２によって決められる。
そして、ＳＴ５０１で選択されたパルス音源ベクトルにＳＴ５０２あるいはＳＴ５０３で選択された拡散ベクトルを畳み込んだものを固定音源符号ベクトルとして選択する。
このように、ある使用頻度の高い特定の形状を有するパルス音源ベクトルに対してのみ複数の追加拡散ベクトルを専用に用意する構成は、情報量の増加が少なくて済み、パルス音源符号帳によっては（使用されていないコードが存在するようなパルス音源符号帳では）ビット数の増加なしに実現できる場合もあり、実現が容易である。
ここで、上記の方法で生成される固定音源符号帳の符号化及び復号化について具体例を用いて説明する。例として、８０サンプルに２パルス立てる場合を考える。なお、２本のパルスをパルス１およびパルス２とし、双方とも８０サンプル中の任意の１サンプルに立てることができるものとし、パルス１とパルス２を同じ１サンプルに重ねて立てることも許容する。この場合、パルス振幅はパルス１とパルス２の振幅を加算したものとなり、両パルスの振幅が１であれば振幅２の１本のパルスとなる。２本のパルスが異なるサンプルに立てられる場合、その組み合わせは８０Ｃ２＝３１６０通りである。２本のパルスの極性関係は同極性と異極性の２通りあるので、パルス音源ベクトルの形状は３１６０×２＝６３２０通りとなる。これに２本のパルスが重なって１本になる場合が８０通り加わり、パルス音源ベクトルの形状は合計６４００種類存在する。最後にパルス音源ベクトル全体の極性が２通りあるため、符号化されるパルス音源ベクトルは６４００×２＝１２８００通り（＜１４ビット）となる。
そして、パルス１よりパルス２が後ろにある場合には２本のパルスは異極性、パルス１とパルス２が同じ位置かパルス２の方が前にある場合には２本のパルスは同極性として、パルス１の極性を１ビットで表現することにより１２８００通りのベクトルを１４ビットで表現することができる。
以下、１４ビットのコードで前記固定符号帳を表す方法を説明する。なお、このような符号化方法は、例えば３ＧＰＰ標準規格のＡＭＲ符号化（３ＧＰＰＴＳ２６．０９０、同２６．０７３、同２６．１０４）等に開示されている。
まず、パルス音源探索を行い、パルス１とパルス２の位置および極性を決定する。次に、パルス１とパルス２の位置関係を調べる。ここで、パルス１よりもパルス２が後方にある場合は、パルス１とパルス２の極性関係が異極性であるかどうか調べ、異極性でない場合はパルス１とパルス２の位置を入れ替える。逆に、パルス１とパルス２が同じ位置かパルス２の方が前にある場合は、パルス１とパルス２の極性関係が同極性であるかどうかを調べ、同極性でない場合はパルス１とパルス２の位置を入れ替える。
このようにして決定されたパルス１とパルス２を以下の様にして符号化する。１４ビットをビット０〜１３（ビット０が最下位ビット）とする。最上位ビットのビット１３（＝Ｓ）をパルス１の極性を表す１ビットとし、正の場合は１、負の場合は０とする。
次に、２本のパルスの位置の組み合わせがコード化される。例えば、パルス１の位置をｐ１、パルス２の位置をｐ２とすれば、コードＣＦは、ＣＦ＝ｐ１×８０＋ｐ２としてコード化される。このようにして得られたＣＦは０〜６３９９である。これをビット０〜１２の１３ビット（０〜８１９１）で表現する。この結果、残りの６４００〜８１９１に追加拡散ベクトルを適用した固定符号ベクトルを割り当てることができる。
追加拡散ベクトルは、
（１）パルス１とパルス２の距離が２サンプルで同極性（７８通り）
（２）パルス１とパルス２の距離が１サンプルで同極性（７９通り）
（３）パルス１とパルス２の距離が０サンプルで同極性（８０通り）
（４）パルス１とパルス２の距離が１サンプルで異極性（７９通り）
（５）パルス１とパルス２の距離が２サンプルで異極性（７８通り）
の５種類の形状のパルス音源ベクトルそれぞれに追加拡散ベクトルを４種類ずつ割り当てられるとすれば、（１）には７８×４＝３１２なので６４００〜６７１１、（２）には７９×４＝３１６なので６７１２〜７０２７、（３）には８０×４＝３２０なので７０２８〜７３４７、（４）には７９×４＝３１６なので７３４８〜７６６３、（５）には７８×４＝３１２なので７６６４〜７９７５、のコードをそれぞれ割り当てることが可能である。具体的には、探索処理によって選択された追加拡散ベクトルの番号をｄｖ（＝０〜３）とすると、
パルス音源ベクトル形状判定器で
（１）と判定された場合は
ＣＦ＝６４００＋７８×ｄｖ＋（ｐ１−２），（２≦ｐ１≦７９）、
（２）と判定された場合は
ＣＦ＝６７１２＋７９×ｄｖ＋（ｐ１−１），（１≦ｐ１≦７９）、
（３）と判定された場合は
ＣＦ＝７０２８＋８０×ｄｖ＋（ｐ１），（０≦ｐ１≦７９）、
（４）と判定された場合は
ＣＦ＝７３４８＋７９×ｄｖ＋（ｐ１），（０≦ｐ１≦７８）、
（５）と判定された場合は
ＣＦ＝７６６４＋７８×ｄｖ＋（ｐ１），（０≦ｐ１≦７７）、
というようにしてコードＣＦを生成する。
最後に極性ビットを最上位につけて、送信コードＦを生成する（Ｆ＝Ｓ×８１９２＋ＣＦ）。
以上の様にしてパルス１の位置ｐ１と極性ｓ１、パルス２の位置ｐ２と極性ｓ２、そして、適用する拡散ベクトル情報を符号化する。
次に、送信コードＦを受信した復号器の復号化について説明する。復号器においては、以下のような手順で２本のパルス位置（ｐ１、ｐ２）と極性（ｓ１、ｓ２）を復号する。
まず、受信コードＦから極性情報Ｓを復号する。
Ｓ＝（（Ｆ＞＞１３）＆１）×２−１（Ｓは−１または＋１となる）
次に、パルス位置情報コードＣＦを復号する。
ＣＦ＝Ｆ＆Ｏｘ１ＦＦＦ
次に、ＣＦの値により、以下のように処理を切替える。
（１）ＣＦが６４００未満の場合
ｐ２＝ＣＦ％８０、ｐ１＝（ＣＦ−ｐ２）÷８０
ｓ１＝Ｓ、ｓ２＝−Ｓ（ｐ２＞ｐ１の場合），＝＋Ｓ（ｐ２≦ｐ１の場合）
拡散ベクトルは基本拡散ベクトルを用いる。
（２）ＣＦが６４００以上６７１２未満の場合
ｐ１＝（ＣＦ−６４００）％７８＋２、ｐ２＝ｐ１−２、ｓ１＝ｓ２＝Ｓ
サブセット１（図６）のｄｖ番目の追加拡散ベクトルを用いる。
ｄｖ＝（（ＣＦ−６４００）−（ｐ１−２））÷７８
（３）ＣＦが６７１２以上７０２８未満の場合
ｐ１＝（ＣＦ−６７１２）％７９＋１、ｐ２＝ｐ１−１、ｓ１＝ｓ２＝Ｓ
サブセット２（図７）のｄｖ番目の追加拡散ベクトルを用いる。
ｄｖ＝（（ＣＦ−６７１２）−（ｐ１−１））÷７９
（４）ＣＦが７０２８以上７３４８未満の場合
ｐ１＝（ＣＦ−７０２８）％８０、ｐ２＝ｐ１、ｓ１＝ｓ２＝Ｓ
サブセット３（図８）のｄｖ番目の追加拡散ベクトルを用いる。
ｄｖ＝（（ＣＦ−７０２８）−ｐ１）÷８０
（５）ＣＦが７３４８以上７６６４未満の場合
ｐ１＝（ＣＦ−７３４８）％７９、ｐ２＝ｐ１＋１、ｓ１＝Ｓ、ｓ２＝−Ｓ
サブセット４（図９）のｄｖ番目の追加拡散ベクトルを用いる。
ｄｖ＝（（ＣＦ−７３４８）−ｐ１）÷７９
（６）ＣＦが７６６４以上７９７５未満の場合
ｐ１＝（ＣＦ−７６６４）％７８、ｐ２＝ｐ１＋２、ｓ１＝Ｓ、ｓ２＝−Ｓ
サブセット５（図１０）のｄｖ番目の追加拡散ベクトルを用いる。
ｄｖ＝（（ＣＦ−７６６４）−ｐ１）÷７８
以上の様にしてパルス１の位置ｐ１と極性ｓ１、パルス２の位置ｐ２と極性ｓ２、そして、適用する拡散ベクトル情報を復号する。
図１４は、固定音源符号帳の他の構成を示すブロック図である。
図１４の固定音源符号帳２０７は、２つの固定音源符号帳のサブセット６０８、６０９を有する。第１の固定音源符号帳のサブセット６０８は、第１のパルス音源符号帳６０１、拡散ベクトル格納器６０２及び拡散ベクトル畳込み処理器６０３の３つのブロックから構成される。第１のパルス音源符号帳６０１は所定のパルス音源ベクトル（例えば２本のパルスから成るベクトル）を生成する音源符号帳である。拡散ベクトル格納器６０２はパルス音源符号帳６０１専用に設計した拡散ベクトルを格納する格納器である。拡散ベクトル畳込み処理器６０３は、第１のパルス音源符号帳６０１から出力されたパルス音源ベクトルに拡散ベクトル格納器６０２から出力された拡散ベクトルを畳み込む畳込み処理器である。
同様に、第２の固定音源符号帳サブセット６０９が第２のパルス音源符号帳６０４（例えば第２のパルス音源符号帳６０４は第１のパルス音源符号帳６０１と異なり、３本や５本のパルスから成るパルス音源ベクトルを生成する）、拡散ベクトル格納器６０５及び拡散ベクトル畳込み処理器６０６の３つのブロックから構成される。
ここで、各固定音源符号帳サブセット内の拡散ベクトル格納器はそれぞれのサブセットのパルス音源符号帳専用に設計されておりサブセット間で異なる拡散ベクトルを格納している。
なお、本実施の形態においては、固定音源符号帳のサブセット数は２としたが、本発明ではその数に制限はなく、３以上でも同様の効果が得られる。
また、各サブセット内のパルス音源符号帳は、音源ベクトルに含まれる音源パルス数が異なっても良いし、音源パルスのパターン（例えばある音源パルス符号帳は互いに接近した音源パルスの組み合わせのみを生成し、別の音源パルス符号帳は互いに離れた音源パルスの組み合わせのみを生成するようにするなど）が異なっていても良い。
いずれにしても、サブセット毎に異なる特性・特徴を有する音源ベクトルが生成されるようになっていると性能改善度が高い。切替スイッチ６０７は、拡散ベクトル畳込み処理器６０３あるいは拡散ベクトル畳込み処理器６０６から出力される固定音源ベクトルのうち、いずれか一方を選択するためのスイッチである。
この固定音源符号帳は、パラメータ決定部２１２から入力される信号（Ｆ）で特定される固定音源ベクトルを、第１の固定音源符号帳サブセット６０８または第２の固定音源符号帳サブセット６０９により生成し、スイッチ６０７を介して固定音源ベクトルとして出力する。
図１５は、図１４の固定音源符号帳を探索する場合の処理手順を示すフローチャートである。
まず、ＳＴ７０１において第１の固定音源符号帳サブセット探索が行われ、量子化誤差を最小とする固定音源ベクトルが選択される。
次に、ＳＴ７０２において第２の固定音源符号帳サブセット探索が行われ、ＳＴ７０１において選択された固定音源ベクトルよりもさらに量子化誤差を小さくする固定音源ベクトルがあればそれを最終的な固定音源ベクトルとして選択する。
なお、ＳＴ７０１とＳＴ７０２は、異なる固定音源符号帳に対して異なる拡散ベクトルが適用されている点が異なるのみで、具体的探索方法は前述した従来技術と同一である。前記異なる固定音源符号帳は、互いに生成される音源符号ベクトルの特徴が異なる（例えば音源パルス数が異なる）ように用意される。
例えば、第１の固定音源符号帳サブセットは音源パルス２本から構成される音源ベクトルを生成し、第２の固定音源符号帳サブセットは音源パルス５本から生成される固定音源ベクトルを生成する、というように音源パルス本数が異なる固定音源符号帳サブセットを用意する。あるいは、第１の固定音源符号帳サブセットは音源パルス同士が接近した組み合わせの固定音源ベクトルを生成し、第２の固定音源符号帳サブセットは複数の音源パルスがベクトル全体に分散して配置されているような固定音源ベクトルを生成する（例えば、第１の固定音源符号帳サブセットも第２の固定音源符号帳サブセットも同じパルス数から成る音源ベクトルを生成するが、第１の固定音源符号帳サブセットは所定のサンプル数Ｍ（例えば、２〜１０サンプル）の範囲内に全てのパルスが配置された固定音源符号帳ベクトルを生成し、第２の固定音源符号帳サブセットは、全ての音源パルス間隔が所定のサンプル数Ｍ’（例えば、１０サンプル）以上である固定音源ベクトルを生成する）ように音源パルスの組み合わせ方が異なるような固定音源符号帳サブセットを用意する。
このように、使用頻度が高い特定の形状の音源ベクトルに対して、専用の拡散ベクトルを適用することで、効率的に復元音声の品質を改善することができる。あるいは、パルス音源ベクトルの特徴に応じて異なる拡散ベクトルを適用することで、効率的に復元音声の品質を改善することができる。
なお、使用頻度が高い特定形状のパルス音源ベクトルに対してのみ、複数の専用の拡散ベクトルを用意する構成であれば、拡散ベクトルのパターン数の増加はほとんど問題とならないし、拡散ベクトルのパターン設計の手間もほとんど問題とならない。
その一方、きわめて効果的（効率的）に、復元音声の品質を向上できる。すなわち、実際の音質の向上に役立たない拡散ベクトルを多数用意することは無駄な処理であり、本発明では、少量の専用の拡散パターン（追加拡散ベクトル）を付加することで、効率的に音質向上という効果を得ることができる。
以上説明した固定音源符号帳は、ハードウエアで実現できることはもちろんのこと、必要なベクトルデータをデータベースに蓄積しておき、そのデータを用いて適宜、ソフトウエアにより、固定音源ベクトルの波形データを生成することによっても実現することができる。
（実施の形態２）
高域強調機能をもつディジタルフィルタは、従来から、合成フィルタより後の信号処理を行う部分に設けられていたが、このフィルタは、一般に、一次のディジタルフィルタによって表現されるハイパスフィルタであり、例えばＪ−Ｈ．ＣｈｅｎａｎｄＡ．Ｇｅｒｓｈｏ，”ＡｄａｐｔｉｖｅＰｏｓｔｆｉｌｔｅｒｉｎｇｆｏｒＱｕａｌｉｔｙＥｎｈａｎｃｅｍｅｎｔｏｆＣｏｄｅｄＳｐｅｅｃｈ”，ＩＥＥＥＴｒａｎｓ．Ｓｐｅｅｃｈ＆ＡｕｄｉｏＰｒｏｃｅｓｓｉｎｇ，Ｖｏｌ．３，Ｎｏ．１，Ｊａｎ．１９９５に示されている。
これに対し、本実施の形態の特徴は、音声復号化側において、合成フィルタを経る前の信号に対して独自の高域強調処理を行うことである。
図１６は、図２の音声復号化装置１１１の構成を示すブロック図である。
図１６において、ＲＦ復調装置１１０から出力された符号化情報は、多重化分離部８０１によって多重化されている符号化情報を個々の符号情報に分離される。分離されたＬＰＣ符号（Ｌ）はＬＰＣ復号化部８０２に出力され、分離された適応音源ベクトル符号（Ａ）は適応音源符号帳８０５に出力され、分離された音源利得符号（Ｇ）は量子化利得生成部８０６に出力され、分離された固定音源ベクトル符号（Ｆ）は固定音源符号帳８０７へ出力される。
ＬＰＣ復号化部８０２は多重化分離部８０１から出力された符号（Ｌ）からＬＰＣを復号し、合成フィルタ８０３に出力する。適応音源符号帳８０５は、多重化分離部８０１から出力された符号（Ａ）で指定される過去の駆動音源信号サンプルから１フレーム分のサンプルを適応音源ベクトルとして取り出して乗算器８０８へ出力する。
量子化利得生成部８０６は、多重化分離部８０１から出力された音源利得符号（Ｇ）で指定される適応音源ベクトル利得と固定音源ベクトル利得を復号し乗算器８０８、８０９へ出力する。
固定音源符号帳８０７は、多重化分離部８０１から出力された符号（Ｆ）で指定される固定音源ベクトルを生成し、乗算器８０９へ出力する。
乗算器８０８は、適応音源ベクトルに前記適応音源ベクトル利得を乗算して、加算器８１０へ出力する。乗算器８０９は、固定音源ベクトルに固定音源ベクトル利得を乗算して、加算器８１０へ出力する。
加算器８１０は、乗算器８０８、８０９から出力された利得乗算後の適応音源ベクトルと固定音源ベクトルの加算を行い、駆動音源ベクトルを生成し、高域強調部８１１へ出力する。
高域強調部（高域強調ポストフィルタ）８１１は、駆動音源ベクトルに対して独自の高域強調処理を行い（例えば、周波数が高い成分ほど振幅強調の度合いが高くなるような高域強調処理を行い）、高域強調後の信号を合成フィルタ８０３に出力する。なお、高域強調部８１１の詳細については後述する。
合成フィルタ８０３は、高域強調部８１１から出力された音源ベクトルを駆動信号として、ＬＰＣ復号化部８０２によって復号されたフィルタ係数を用いて、フィルタ合成を行い、合成した信号を後処理部８０４へ出力する。
後処理部８０４は、ホルマント強調やピッチ強調といったような音声の主観的な品質を改善する処理や、定常雑音の主観的品質を改善する処理などを施した上で、最終的な復号音声信号としてＤ／Ａ変換装置１１２へ出力する。
次に、高域強調処理について、図１７を用いて具体的に説明する。
一般に、ＣＥＬＰ符号化においては復号信号の高周波成分が減衰する傾向がある。特に、低ビットレートではその傾向が大きくなるため、復号信号の高域成分を強調することにより、ある程度の主観的品質を改善することが可能である。
図１７の高域強調部（高域強調ポストフィルタ）８１１において、音源ベクトルはハイパスフィルタ（ＨＰＦ）９０１、加算器９０２及び加算器９０３に入力される。
ハイパスフィルタ９０１は、強調したい帯域成分を抽出する役目を果たす。駆動音源ベクトルの、ハイパスフィルタ９０１のカットオフ周波数より高域の成分は加算器９０３、対数パワ計算器９０４及び乗算器９０６に出力される。
加算器９０３は音源ベクトルから音源ベクトルの高域成分の減算を行い、対数パワ計算器９０５へ出力する。
対数パワ計算器９０４は、音源ベクトルの高域成分の対数パワを算出してパワ比計算器９０７へ出力する。対数パワ計算器９０５は、音源ベクトルから高域成分を取り除いた信号の対数パワを算出してパワ比計算器９０７へ出力する。
パワ比計算器９０７は音源ベクトルの高域成分とその他成分との対数パワ比を計算し、強調係数計算器９０８へ出力する。
強調計算器９０８は前記対数パワ比が原則一定となるように、音源ベクトルの高域成分に乗じるべき係数（強調係数Ｒｒ）を算出する。
具体的には、対数パワ計算器９０４から出力された信号をＥｈ［ｉ］、対数パワ計算器９０５から出力された信号をＥｌ［ｉ］とすると、パワ比計算器９０７から出力される対数パワ比Ｒは、Ｌをサブフレーム長とすると以下の式（１）で表される。

そこで、強調計算器９０８は、この対数パワ比Ｒを一定値Ｃｒ（例えば０．４２）にするためにＣｒとＲとの比（対数パワ比）として係数Ｒｒを以下の式（２）で求める。

リミッタ９０９は、係数Ｒｒの上限値（例えば０）と下限値（例えば０．３）を設定し、強調計算器９０８にて算出された係数Ｒｒの値が上限値より大きい場合には係数Ｒｒを上限値とし、下限値より小さい場合には係数Ｒｒを下限値とする。
平滑化回路９１０は、サブフレーム間やサンプル間でスムーズに強調係数Ｒｒの値が変化するように、強調係数Ｒｒの値を時間的に（サンプル間あるいは／及びサブフレーム間で）平滑化する。
具体的には、まず、以下の式（３）に示すように対数パワ比を線形領域に戻して１を減じる。これは、高域成分を減じていないもとの音源信号（加算器８１０より）に加算するため、１．０を超える部分のみを加えたいためである。

そして、Ｒｒ１が（サブ）フレーム間で滑らかに変化するように以下の式（４）のように平滑化する。なお、平滑化係数αはそれほど強い平滑化にならない程度に設定する（例えばα＝０．３）。

さらに、この平滑化後の強調係数Ｒｒ１’をハイパスフィルタ９０１の出力信号ｅｘｈ［ｉ］に乗じ、音源ベクトルｅｘ［ｉ］に加算する際、以下の式（５）により、Ｒｒ１’を１サンプル毎に平滑化しＲｒ１’’とする。なお、この平滑化処理は強いものとする（例えばβ＝０．９）。

乗算器９０６は、ハイパスフィルタ９０１からの出力である音源ベクトルの高域成分ｅｘｈ［ｉ］に平滑化回路９１０で平滑化された強調係数Ｒｒ１’’を乗算する。
加算器９０２は、音源ベクトルｅｘｎ［ｉ］に、平滑化された係数を乗じた音源ベクトルの高域成分信号Ｒｒ１’’×ｅｘｈ［ｉ］を加算して、合成フィルタ８０３へと出力する。
なお、上記ｅｘｎ［ｉ］はそのまま合成フィルタ８０３へ出力しても良いが、もとの音源ベクトルｅｘ［ｉ］と同じエネルギーを有するようにスケーリング処理を行うことの方が一般的である。このようなスケーリング処理は加算器９０２の後に行っても良いし、スケーリング処理を考慮して上記Ｒｒ１’’を算出するようにしても良い。後者の場合、平滑化回路９１０へハイパスフィルタ９０１から入力線が必要になる。前者の場合、加算器９０２と合成フィルタ８０３の間にスケーリング処理部が入り、スケーリング処理部には、音源ベクトル（加算器８１０より）と高域強調後の音源ベクトル（加算器９０２より）が入力されることになる。
具体的な処理は以下の様になる。
（加算器９０２の後で行う場合）

（Ｒｒ１’’にスケーリング処理を含めてしまう場合）

ハイパスフィルタ９０１の特性は、復号音声信号の主観的品質が最も良くなるように調整する。具体的には、サンプリング周波数が８ｋＨｚの場合、カットオフ周波数が３ｋＨｚ前後となるような２次のＩＩＲフィルタとするのが好適である。なお、本発明の実施の形態では、前記カットオフ周波数は符号化装置の音源信号符号化特性に応じて自由に設計することが可能である。また、前記ハイパスフィルタの次数も、必要とされるフィルタ特性や許容される演算量に応じて自由に設計することが可能である。
このように、独自の伝達関数をもつディジタルフィルタによる高域強調処理を行うことにより、励信信号の高周波数域におけるゲイン低下を補償してフラットな特性を実現することができるので、聴感向上に効果的な独自のフィルタ特性を実現することができ、効果的に復元音声の品質の改善を図ることができる。例えば、高域強調を行うことによって、復元音声がこもった感じの主観品質となることを防ぐことができる。
また、合成フィルタの前に、本高域強調ポストフィルタを設けることは簡単にでき、本発明を実際の製品に適用することも容易である。
以上説明したように、本発明によれば、最小限度のハードウエア等の追加により、効率的に復元音声の品質の向上を図ることができる。また，本発明によれば、パルス拡散構造を有する固定音源符号帳の性能改善が可能である。また、ＣＥＬＰ符号化における音源ベクトルの高域減衰を効果的に補償し、主観品質を改善することができる。
なお、本願発明の固定ベクトルの生成方法、ＣＥＬＰ型音声符号化方法あるいはＣＥＬＰ型音声復号化方法は、プログラムを通信回線もしくはＣＤその他の記憶媒体からインストールしてＣＰＵ等の制御手段で実行することにより各々実現することができる。
本明細書は、２００２年２月２０日出願の特願２００２−０４３８７８に基づくものである。この内容をここに含めておく。
産業上の利用可能性
本発明は、ＣＥＬＰ型音声符号化装置あるいはＣＥＬＰ型音声復号化装置に用いるに好適である。
【図面の簡単な説明】
図１は、従来のパルス拡散構造を有する固定音源符号帳の構成の一例を示すブロック図、
図２は、本発明における音声信号送信装置および音声信号受信装置の全体構成の概略を示す図、
図３は、本発明の実施の形態１に係る音声符号化装置の構成を示すブロック図、
図４は、本発明の実施の形態１に係る固定音源符号帳の構成を示すブロック図、
図５Ａは、本発明の実施の形態１に係るパルス音源ベクトルの使用頻度の分布を示す図、
図５Ｂは、本発明の実施の形態１に係るパルス音源ベクトルの使用頻度の分布を示す図、
図６は、本発明の実施の形態１に係る追加拡散ベクトルの一例を示す図、
図７は、本発明の実施の形態１に係る追加拡散ベクトルの一例を示す図、
図８は、本発明の実施の形態１に係る追加拡散ベクトルの一例を示す図、
図９は、本発明の実施の形態１に係る追加拡散ベクトルの一例を示す図、
図１０は、本発明の実施の形態１に係る追加拡散ベクトルの一例を示す図、
図１１は、本発明の実施の形態１に係る基本拡散ベクトルの一例を示す図、
図１２は、本発明の実施の形態１に係る拡散ベクトル格納器の選択処理の内容を具体的に説明するための図、
図１３は、本発明の実施の形態１に係る固定音源符号帳の処理手順を示すフローチャート
図１４は、本発明の実施の形態１に係る固定音源符号帳の他の構成を示すブロック図、
図１５は、本発明の実施の形態１に係る固定音源符号帳を探索する場合の処理手順を示すフローチャート
図１６は、本発明の実施の形態２に係る音声復号化装置の構成を示すブロック図、及び、
図１７は、本発明の実施の形態２に係る高域強調部の構成を示すブロック図である。Technical field
The present invention relates to a fixed excitation vector generation method and a fixed excitation codebook used in a CELP speech coding apparatus or a CELP speech decoding apparatus.
Background art
In fields such as digital mobile communications, packet communications represented by Internet communications, and voice storage, voice information is compressed and encoded with high efficiency for effective use of transmission path capacity such as radio waves and storage media. For this purpose, a speech encoding device is used.
Among them, a method based on the CELP (Code Excited Linear Prediction) method has been widely put into practical use at medium and low bit rates. For the CELP technique using a pulsed sound source as a driving sound source signal, see M.M. R. Schrodder and B.M. S. Atal: “Code-Excited Linear Prediction (CELP): High-quality Speech at Very Low Bit Rates”, Proc. ICASSP-85, 25.1.1, pp. 937-940, 1985 ".
In the CELP speech coding method, a digitized speech signal is divided into fixed frame lengths (about 5 ms to 50 ms), linear speech prediction is performed for each frame, and a prediction residual (excitation signal) by linear prediction for each frame. ) Is encoded using an adaptive codebook having a known waveform and a noise (fixed) codebook.
The adaptive codebook stores drive excitation signals generated in the past, and is used to represent the periodic component of the audio signal. The fixed codebook stores a vector having a predetermined number of predetermined shapes prepared in advance, and is mainly used to express aperiodic components that cannot be expressed by the adaptive codebook.
As a vector stored in the fixed codebook, a vector composed of a random noise sequence, a vector expressed by a combination of several pulses, or the like is used.
An algebraic fixed codebook is one of the typical fixed codebooks that express a vector by a combination of several pulses. Specific contents of the algebraic fixed codebook are shown in “ITU-T recommendation G.729” and the like. The algebraic fixed codebook has an advantage that a fixed excitation codebook can be searched with a small amount of calculation and the capacity of a ROM for storing excitation vectors can be reduced. On the other hand, however, there is a problem that faithful code representation of noise components is difficult.
One of the methods for solving the problems of the algebraic fixed codebook is a technique using a pulse spreading codebook. The pulse spreading is disclosed in “ITU-T recommendation G.729 Annex-D” and the like. This pulse spreading is a method of generating a fixed sound source vector by convolving a diffusion pattern (fixed waveform) with a sound source vector.
FIG. 1 is a block diagram showing an example of a configuration of a fixed excitation codebook having a conventional pulse spreading structure. The pulse spread codebook 10 includes a pulse excitation codebook 11, a spread vector convolution processor 12, and a spread vector storage 13.
A pulse excitation vector is output from the pulse excitation codebook 11, and the diffusion vector taken out from the diffusion vector storage 13 is convoluted with the pulse excitation vector by the diffusion vector convolution processor 12, whereby a fixed excitation is obtained. A vector (noise source vector) is generated.
With conventional pulse spreading, it is possible to improve the performance of a pulse excitation codebook at a low bit rate, for example, 4 kbit / s or less.
However, for example, in the next generation mobile phone system, there is a demand for further quality improvement (that is, further improving the quality of the restored voice), and it is difficult to satisfy this requirement with existing technology. .
For example, simply increasing the spread vector pattern does not improve the quality of the restored speech, and the increase in the spread vector pattern increases memory capacity and complicates signal processing. There is a risk of inviting.
Disclosure of the invention
An object of the present invention is to provide a technique capable of improving the quality of restored speech by improving the speech quality on the speech encoding or decoding side, and restoring speech that is more natural and easy to hear for the user. That is.
The purpose of this is to generate a fixed excitation vector on the speech encoding side, for example, by selecting in advance a pulse excitation vector having a specific shape that is frequently used from among a large number of pulse excitation vectors. This is achieved by providing a corresponding dedicated diffusion vector.
On the speech decoding side, for example, a sound source signal (a signal simulating a voice uttered by a human vocal cord) before being input to a synthesis filter (having a function simulating a human vocal tract) is conventionally used. This is achieved by applying high-frequency emphasis processing with no devised characteristics.
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, embodiments of the present invention will be described with reference to the drawings.
First, the outline of the overall configuration of the audio signal transmitting apparatus and the audio signal receiving apparatus in the present invention will be described with reference to FIG.
In FIG. 2, the audio signal 101 is converted into an electrical signal by the input device 102 and output to the A / D conversion device 103. The A / D conversion device 103 converts the (analog) signal output from the input device 102 into a digital signal and outputs it to the speech encoding device 104. The speech encoding device 104 encodes the digital speech signal output from the A / D conversion device 103 using a speech encoding method to be described later, and outputs the encoded information to the RF modulation device 105. The RF modulation device 105 converts the speech coding information output from the speech coding device 104 into a signal for transmission on a propagation medium such as a radio wave and outputs the signal to the transmission antenna 106. The transmission antenna 106 transmits the output signal output from the RF modulation device 105 as a radio wave (RF signal). In the figure, an RF signal 107 represents a radio wave (RF signal) transmitted from the transmission antenna 106. The above is the configuration and operation of the audio signal transmitting apparatus.
The RF signal 108 is received by the receiving antenna 109 and output to the RF demodulator 110. Note that the RF signal 108 in the figure represents the radio wave received by the receiving antenna 109 and is exactly the same as the RF signal 107 if there is no signal attenuation or noise superposition in the propagation path.
The RF demodulator 110 demodulates speech coding information from the RF signal output from the receiving antenna 109 and outputs the demodulated speech information to the speech decoder 111. The audio decoding device 111 decodes the audio signal from the audio encoding information output from the RF demodulation device 110 using an audio decoding method described later, and outputs the audio signal to the D / A conversion device 112. The D / A converter 112 converts the digital audio signal output from the audio decoder 111 into an analog electrical signal and outputs it to the output device 113.
The output device 113 converts an electrical signal into air vibration and outputs it as a sound wave so that it can be heard by a human ear. In the figure, reference numeral 114 represents the outputted sound wave. The above is the configuration and operation of the audio signal receiving apparatus.
By including at least one of the above-described audio signal transmitting apparatus and receiving apparatus, a base station apparatus and a mobile terminal apparatus in a mobile communication system can be configured.
Hereinafter, with respect to the improvement of the generation of the fixed excitation vector using the spread vector on the speech encoding side (Embodiment 1) and the high frequency enhancement process (Embodiment 2) on the speech decoding side, refer to the drawings sequentially. This will be described in detail.
(Embodiment 1)
In the first embodiment, in the fixed excitation codebook, a dedicated diffusion vector (hereinafter referred to as “additional diffusion vector”) used for a pulse excitation vector having a predetermined shape is prepared, and according to the shape of the pulse excitation vector. A case where the optimum diffusion vector is applied will be described.
FIG. 3 is a block diagram showing a configuration of speech encoding apparatus 104 installed in the speech signal transmitting apparatus of FIG.
The input signal of the speech encoding device 104 is a signal output from the A / D conversion device 103 and is input to the preprocessing unit 200. The pre-processing unit 200 performs a waveform shaping process and a pre-emphasis process that lead to performance improvement of a high-pass filter process for removing a DC component and a subsequent encoding process, and an LPC analysis unit 201 And output to the adder 204.
The LPC analysis unit 201 performs linear prediction analysis using Xin, and outputs the analysis result (linear prediction coefficient) to the LPC quantization unit 202. The LPC quantization unit 202 quantizes the linear prediction coefficient (LPC) output from the LPC analysis unit 201, outputs the quantized LPC to the synthesis filter 203, and multiplexes a code L representing the quantized LPC To 213.
The synthesis filter 203 generates a synthesized signal by performing filter synthesis on a driving sound source output from the adder 210 described later using a filter coefficient based on the quantized LPC, and outputs the synthesized signal to the adder 204.
The adder 204 calculates an error signal between the Xin and the combined signal and outputs the error signal to the auditory weighting unit 211. The auditory weighting unit 211 performs auditory weighting on the error signal output from the adder 204, calculates distortion between the Xin and the synthesized signal in the auditory weighting region, and outputs the distortion to the parameter determination unit 212. .
The parameter determination unit 212 outputs the adaptive excitation vector, fixed excitation vector, and quantization gain that minimize the coding distortion output from the perceptual weighting unit 211 to the adaptive excitation codebook 205, fixed excitation codebook 207, and quantization, respectively. The adaptive excitation vector code (A), excitation gain code (G), and fixed excitation vector code (F) indicating the selection result are output from the gain generation unit 206 to the multiplexing unit 213. In addition, when the shape of the pulse excitation vector selected by the fixed excitation codebook 207 is a specific shape set in advance, the parameter determination unit 212 includes a set of additional diffusion vectors prepared exclusively for the vector. Is checked whether there is a diffusion vector that makes the quantization error smaller than the basic diffusion vector, selects the diffusion vector that minimizes the quantization error the most from the basic diffusion vector and the additional diffusion vector, and sends a control signal indicating the selection result Output to fixed excitation codebook 207.
The adaptive excitation codebook 205 buffers the driving excitation signal output by the adder 210 in the past, and one frame worth of past driving excitation signal samples specified by the signal output from the parameter determination unit 212. The sample is cut out as an adaptive sound source vector and output to the multiplier 208.
The quantization gain generation unit 206 outputs the adaptive excitation gain and the fixed excitation gain specified by the signal output from the parameter determination unit 212 to the

multipliers

208 and 209, respectively.
Fixed excitation codebook 207 outputs a fixed excitation vector obtained by multiplying a pulse excitation vector having a shape specified by the signal output from parameter determination section 212 by a diffusion vector to multiplier 209. The configuration of the fixed excitation codebook 207 is a characteristic part of the present embodiment, and this characteristic part will be specifically described later.
Multiplier 208 multiplies the adaptive excitation vector output from adaptive excitation codebook 205 by the quantized adaptive excitation gain output from quantization gain generator 206 and outputs the result to adder 210.
Multiplier 209 multiplies the fixed excitation vector output from fixed excitation codebook 207 by the quantized fixed excitation gain output from quantization gain generation section 206 and outputs the result to adder 210.
The adder 210 inputs the adaptive excitation vector and fixed excitation vector after gain multiplication from the

multipliers

208 and 209, respectively, adds these vectors, and adds the drive excitation as the addition result to the synthesis filter 203 and the adaptive excitation codebook 205. Output to.
The multiplexing unit 213 receives the code (L) representing the quantized LPC from the LPC quantization unit 202, the code (A) representing the adaptive excitation vector, the code (F) representing the fixed excitation vector, and the quantization from the parameter determining unit 212 A code (G) representing the gain is input, and the information is multiplexed and output to the transmission line as encoded information.
The above is the description of each component of the speech encoding device 104.
Next, a specific configuration and characteristics of fixed excitation codebook 207 will be described with reference to the drawings.
FIG. 4 is a block diagram showing a configuration of fixed excitation codebook 207 of FIG.
In FIG. 4, a pulse excitation codebook 301 outputs a pulse excitation vector to a pulse excitation vector shape determiner 302 and a spread vector convolution processor 303, respectively.
The pulsed sound source vector shape determining unit 302 stores a predetermined vector shape in a memory in association with a parameter for specifying the vector shape. Here, when the pulse source vector is composed of only a few pulses, these shapes are specified by the inter-pulse distance (how many samples are separated) and the pulse polarity relationship (different polarity or same polarity). . In this case, the distance between pulses and the polarity relationship between pulses are parameters.
Then, the pulse excitation vector shape determining unit 302 compares the parameter of the pulse excitation vector output from the vector shape pulse excitation codebook 301 with the parameter of each vector shape stored, for example, when all the parameters match. These vectors are determined to have the same shape. When the pulse source vector is composed of only a few pulses, the pulse source vector shape determiner 302 has the same shape as long as the relative position and polarity relationship between the pulses is the same. Is determined. A vector having the same pulse polarity at the same pulse interval and shifted in the time axis direction or a vector obtained by multiplying the vector magnitude (pulse amplitude) by a constant is also determined as a vector having the same shape.
When a vector having the same shape exists, the pulse excitation vector shape determiner 302 outputs a control signal to the diffusion vector storage 304 so as to output an additional diffusion vector designed exclusively for the pulse excitation vector having that shape. On the other hand, the pulse sound source vector shape determiner 302 outputs a control signal to the diffusion vector storage 304 so as to output a basic diffusion vector when vectors having the same shape do not exist.
The diffusion vector storage 304 stores, in a memory, an additional diffusion vector used for a pulse excitation vector having a predetermined shape in addition to a basic diffusion vector used in common for all pulse excitation vectors. The diffusion vector output to the diffusion vector convolution processor 303 is switched by the control signal from the determination unit 212 and the control signal from the pulsed sound source vector shape determiner 302. That is, the diffusion vector storage unit 304 selects a diffusion vector corresponding to the pulse excitation vector shape determined by the fixed excitation vector shape determination unit 302 and outputs it to the diffusion vector convolution processor 303.
The spread vector convolution processor 303 convolves the spread vector extracted from the spread vector storage 304 with the pulse excitation vector output from the pulse excitation codebook 301. Thereby, a fixed sound source vector (noise sound source vector) is generated.
In this way, an optimum diffusion vector shape is selected according to the shape of the sound source vector, and a predetermined diffusion vector (one type or a plurality of types of basic diffusion vectors) is applied to all pulse sound source vectors by convolution. The encoding performance can be improved as compared with the case of doing so.
Here, any number of vector shapes may be stored in the memory of the pulse sound source vector shape determiner 302. However, an additional diffusion vector is prepared only for a sound source vector having a specific shape that is frequently used. The increase in ROM capacity caused by narrowing the number of and introducing additional diffusion vectors can be suppressed.
Hereinafter, a method for selecting a sound source vector of a specific shape with high use frequency, which is stored in the memory of the pulse sound source vector shape determiner 302 a priori, and a method for selecting an additional diffusion vector applied thereto will be described in detail.
5A and 5B show the distribution of the frequency of use of the pulse excitation vector (in the case of two pulses) output from the pulse excitation codebook 301 when the distance between the pulses and the polarity of each pulse are used as parameters. It is a figure showing, and is actually encoded and tabulated for several hours of audio data. FIG. 5B is an enlarged view of FIG. 5A in the horizontal axis direction. In FIG. 5A and FIG. 5B, the horizontal axis indicates the inter-pulse distance (sample), and the vertical axis indicates the sound source vector having the inter-pulse distance. The frequency of use is shown. In FIGS. 5A and 5B, the origin indicates that two pulses overlap each other and that the source vector is one pulse, the left side of the origin is a combination of pulses of different polarity, and the right side is a combination of the same polarity. Represents each.
Note that the normalized use frequency is a value obtained by dividing the number of times the pulse sound source vector at each interval is used by the number of combinations of pulses at each interval. For example, when the interval is 1 sample, the first pulse is 1 sample. In the case where there are a plurality of combinations such as 2 samples of the second pulse, 2 samples and 3 samples of the second pulse, the frequency normalized by all the combinations that can be generated by the pulse excitation codebook.
As apparent from FIGS. 5A and 5B, the frequency of use is concentrated on the sound source vector whose distance between two pulses is within two samples, regardless of the combination of polarities.
Therefore, 5 types of sound source vectors whose distance between two pulses is within 2 samples (same polarity pulse at interpulse distance 0, interpulse distance 1, heteropolarity pulse at interpulse distance 1, homopolarity pulse at interpulse distance 2, pulse Are selected to be stored in the memory of the pulse sound source vector shape determiner 302.
Next, for each selected sound source vector, a dedicated additional diffusion vector is designed by learning.
Note that the learning of the diffusion vector is, for example, Yasunaga et al, “Dispersed-pulse codebook and its applications to a 4 kb / s speech coder,” Proc. ICASSP2000, pp. As shown in section 3.1 of 1503-1506, 2000, a diffusion vector is determined based on a generalized Lloyd algorithm, which minimizes the sum of coding distortions for learning data.
6 to 10 are diagrams showing examples of designed additional diffusion vectors, and are examples in which four types of additional diffusion vectors are designed for each sound source vector.
FIG. 6 shows that four types of dedicated diffusion vectors (A1 to A4) are assigned to the sound source vectors having a pulse-to-pulse distance of 2 samples and the same polarity. Similarly, FIG. 7 shows that four types (B1 to B4) of additional diffusion vectors are provided for a sound source vector having an interpulse distance of 1 sample and a pulse polarity of the same polarity. Similarly, FIG. 8, FIG. 9, and FIG. 10 respectively show four types of sound source vectors having the same polarity when the inter-pulse distance is 0 sample, different polarity when the inter-pulse distance is 1 sample, and different polarity when the inter-pulse distance is 2 samples. It is shown that the additional diffusion vector is provided. As apparent from FIGS. 6 to 10, the shapes of the additional diffusion vectors obtained for the five types of pulsed sound source vectors have different characteristics.
If learning is performed using a common diffusion vector for all sound source vectors, a vector having an average shape of diffusion vectors having these different characteristics is obtained, so there is a limit to performance improvement. . An example of the basic diffusion vector is shown in FIG.
6 to 10 are described on the assumption that four types of additional diffusion vectors are assigned to each sound source vector, the present invention is not limited to this. For example, the number (type) of additional diffusion vectors shown in FIGS. 6 to 10 may be one.
Although not shown in the figure, a separate additional diffusion vector is provided for each sound source vector having a specific shape that is frequently used even when there are three pulses.
FIG. 12 is a diagram for specifically explaining the contents of the selection process of the diffusion vector storage 304 when the additional diffusion vector is the one shown in FIGS. 6 to 10.
The spreading | diffusion vector store 304 is provided with several spreading | diffusion vector subsets 400-405, as shown in FIG.
The diffusion vector subset 400 includes a terminal X0 that outputs a basic diffusion vector, and outputs the basic diffusion vector to the diffusion vector convolution processor 303 via the switch 406.
The diffusion vector subset 401 includes terminals A1 to A4 for outputting the four additional diffusion vectors shown in FIG. 6 and a terminal A0 for outputting the basic diffusion vector, and a parameter determination unit among the five types of diffusion vectors A0 to A4. One of the spread vectors determined by 212 is selected by the switch 407 and output to the spread vector convolution processor 303 via the switch 406.
Similarly, the diffusion vector subsets 402 to 405 respectively have terminals B1 to B4, C1 to C4, D1 to D4, and E1 to E4 and basic diffusion vectors that output the four additional diffusion vectors shown in FIGS. Terminals B 0, C 0, D 0, E 0 for output are selected, and one spread vector determined by the parameter determination unit 212 is selected by the switches 408, 409, 410, and 411, and the spread vector convolution processor 303 is selected via the switch 406 Output to.
In FIG. 12, the basic vectors output from the terminals X0, A0, B0, C0, D0, E0 are the same.
The switch 406 for switching the spread vector subsets 400 to 405 is switched based on the control of the pulse excitation vector shape determiner 302 according to the shape of the pulse excitation vector output from the pulse excitation codebook 301. That is, when a pulse excitation vector having a specific shape that is frequently used is input from the pulse excitation codebook 301 to the pulse excitation vector shape determiner 302, the output terminals of the diffusion vector subsets 401 to 405 corresponding to the pulse excitation vector of that shape Is connected to the switch 406. When a pulse excitation vector not having a specific shape is input from pulse excitation codebook 301 to pulse excitation vector shape determiner 302, switch 406 is connected to the output terminal of spreading vector subset 400.
The switches 407 to 411 are connected to terminals that output the diffusion vectors determined by the parameter determination unit 212 among the five types of diffusion vectors included in the respective diffusion vector subsets 401 to 405.
With the above configuration, when the same excitation vector as that stored in the pulse excitation vector shape determiner 302 is output from the fixed excitation codebook 301, there are five types of four types of additional diffusion vectors and basic diffusion vectors. The best one is selected.
In FIG. 12, the number of diffusion vector subsets having additional diffusion vectors is five. However, in the present invention, the number of diffusion vector subsets is not limited, and may be appropriately increased or decreased according to the number of frequently used pulse excitation vector patterns. Can be made. In addition, although there are four types of additional diffusion vectors provided in each diffusion vector subset, the number of additional diffusion vectors is not limited in the present invention.
The procedure of the important part of the process described above is shown in FIG. FIG. 13 is a flowchart showing a processing flow of fixed excitation codebook search shown in FIG.
First, in ST501, a pulse sound source search using a basic diffusion vector is performed. An impulse (that is, no diffusion) may be used as the basic diffusion vector. Specific search methods are disclosed in, for example, Japanese Patent Laid-Open No. 10-63300 (17th paragraph (prior art) and 51st to 54th paragraphs). Yasunaga et al, “Dispersed-pulse codebook and its applications to a 4 kb / s speech coder,” Proc. ICASSP2000, pp. 1503-1506, 2000, section 2.2.
Next, in ST502, it is checked whether or not the pulse sound source vector selected in ST501 has a predetermined specific parameter (combination of pulse position and polarity).
These specific shapes refer to the shapes of vectors that are used as fixed excitation vectors (selected as a result of search) among the pulse excitation vectors generated from the pulse excitation codebook.
More specifically, for example, in a two-pulse sound source, for example, a pulse distance is 1 sample (for example, a sound source pulse is set at the 11th sample and the 12th sample), and the pulse polarity has a different sign, A frequently used vector is a shape having a distance of 2 samples (for example, a sound source pulse is set at the 20th sample and the 22nd sample) and a pulse polarity having the same sign.
If it is not a sound source vector having such a specific shape, a convolution of the basic diffusion vector with the pulsed sound source vector selected in ST501 is used as the fixed sound source vector.
That is, the switch 406 in FIG. 12 is connected to the terminal X 0 of the diffusion vector subset 400. If the pulsed sound source vector selected in ST501 is a vector having a specific shape, the process proceeds to ST503.
In ST503, there is a diffusion vector that makes the quantization error smaller than that of the basic diffusion vector among additional diffusion vectors of diffusion vector subsets (diffusion vector subsets 401 to 405 in FIG. 12) prepared exclusively for vectors having a specific shape. The diffusion vector that minimizes the quantization error is selected from the basic diffusion vector and the additional diffusion vector. Note that which diffusion vector subset including which additional diffusion vector is used is determined by the pulsed sound source vector shape determining unit 302.
Then, a convolution of the pulse excitation vector selected in ST501 with the diffusion vector selected in ST502 or ST503 is selected as a fixed excitation code vector.
In this way, a configuration in which a plurality of additional diffusion vectors are prepared exclusively for a pulse excitation vector having a specific shape with a high frequency of use requires only a small increase in the amount of information, and depending on the pulse excitation codebook ( In some cases, it can be realized without increasing the number of bits (in the case of a pulse excitation codebook in which there is an unused code).
Here, encoding and decoding of the fixed excitation codebook generated by the above method will be described using a specific example. As an example, consider a case where two pulses are set for 80 samples. It is assumed that the two pulses are pulse 1 and pulse 2, both of which can be set to any one of the 80 samples, and it is allowed to set pulse 1 and pulse 2 on the same one sample. In this case, the pulse amplitude is the sum of the amplitudes of pulse 1 and pulse 2, and if the amplitude of both pulses is 1, it becomes one pulse of amplitude 2. When two pulses are put on different samples, there are 80C2 = 3160 combinations. Since the two pulses have the same polarity and different polarity, the shape of the pulse sound source vector is 3160 × 2 = 6320. There are 80 cases where two pulses overlap to form one, and there are a total of 6400 types of pulsed sound source vectors. Finally, since there are two types of polarity of the entire pulse excitation vector, the number of encoded pulse excitation vectors is 6400 × 2 = 12,800 (<14 bits).
When pulse 2 is behind pulse 1, the two pulses have different polarities, and when pulse 1 and pulse 2 are in the same position or pulse 2 is ahead, the two pulses have the same polarity. By expressing the polarity of pulse 1 with 1 bit, 12800 vectors can be expressed with 14 bits.
Hereinafter, a method of representing the fixed codebook with a 14-bit code will be described. Such an encoding method is disclosed in, for example, 3GPP standard AMR encoding (3GPP TS 26.090, 26.073, and 26.104).
First, a pulse sound source search is performed, and the positions and polarities of pulse 1 and pulse 2 are determined. Next, the positional relationship between pulse 1 and pulse 2 is examined. Here, when the pulse 2 is behind the pulse 1, it is checked whether or not the polarity relationship between the pulse 1 and the pulse 2 is different. If not, the positions of the pulse 1 and the pulse 2 are switched. Conversely, if pulse 1 and pulse 2 are at the same position or pulse 2 is ahead, it is checked whether the polarity relationship between pulse 1 and pulse 2 is the same polarity. Swap the position of 2.
The

pulses

1 and 2 determined in this way are encoded as follows. The 14 bits are bits 0 to 13 (bit 0 is the least significant bit). Bit 13 (= S) of the most significant bit is 1 bit representing the polarity of pulse 1, and is 1 for positive and 0 for negative.
Next, the combination of the positions of the two pulses is coded. For example, if the position of pulse 1 is p1 and the position of pulse 2 is p2, the code CF is coded as CF = p1 × 80 + p2. The CF thus obtained is 0-6399. This is expressed by 13 bits (0-8191) of bits 0-12. As a result, a fixed code vector to which the additional diffusion vector is applied can be assigned to the remaining 6400 to 8191.
The additional diffusion vector is
(1) The distance between pulse 1 and pulse 2 is the same polarity for 2 samples (78 ways)
(2) Pulse 1 and pulse 2 have the same polarity for one sample (79 ways)
(3) Pulse 1 and pulse 2 have the same polarity when the distance is 0 sample (80 patterns)
(4) The distance between pulse 1 and pulse 2 is one sample and different polarity (79 ways)
(5) The distance between pulse 1 and pulse 2 is 2 samples and different polarity (78 ways)
If four types of additional diffusion vectors can be assigned to each of the five types of pulse sound source vectors of (1), 78 × 4 = 312 because (1) is 6400-6711, and (2) is 79 × 4 = 316. 6712 to 7027, (3) 80x4 = 320, so 7028-7347, (4) 79x4 = 316, so 7348-7663, (5) 78x4 = 312, so 7664-7975, Each code can be assigned. Specifically, when the number of the additional diffusion vector selected by the search process is dv (= 0-3),
Pulse source vector shape determiner
If it is determined (1)
CF = 6400 + 78 × dv + (p1-2), (2 ≦ p1 ≦ 79),
If it is determined (2)
CF = 6712 + 79 × dv + (p1-1), (1 ≦ p1 ≦ 79),
If it is determined that (3)
CF = 7028 + 80 × dv + (p1), (0 ≦ p1 ≦ 79),
If it is determined (4)
CF = 7348 + 79 × dv + (p1), (0 ≦ p1 ≦ 78),
If it is determined (5)
CF = 7664 + 78 × dv + (p1), (0 ≦ p1 ≦ 77),
In this way, the code CF is generated.
Finally, the transmission bit F is generated with the most significant bit added (F = S × 8192 + CF).
As described above, the position p1 and polarity s1 of the pulse 1, the position p2 and polarity s2 of the pulse 2, and the spreading vector information to be applied are encoded.
Next, decoding of the decoder that has received the transmission code F will be described. In the decoder, the two pulse positions (p1, p2) and polarity (s1, s2) are decoded in the following procedure.
First, the polarity information S is decoded from the received code F.
S = ((F >> 13) & 1) × 2-1 (S is −1 or +1)
Next, the pulse position information code CF is decoded.
CF = F & Ox1FFF
Next, processing is switched as follows according to the value of CF.
(1) When CF is less than 6400
p2 = CF% 80, p1 = (CF−p2) ÷ 80
s1 = S, s2 = −S (when p2> p1), = + S (when p2 ≦ p1)
A basic diffusion vector is used as the diffusion vector.
(2) When CF is 6400 or more and less than 6712
p1 = (CF-6400)% 78 + 2, p2 = p1-2, s1 = s2 = S
Use the dvth additional diffusion vector of subset 1 (FIG. 6).
dv = ((CF-6400)-(p1-2)) ÷ 78
(3) When CF is not less than 6712 and less than 7028
p1 = (CF-6712)% 79 + 1, p2 = p1-1, s1 = s2 = S
The dvth additional diffusion vector of subset 2 (FIG. 7) is used.
dv = ((CF-6712)-(p1-1)) ÷ 79
(4) When CF is 7028 or more and less than 7348
p1 = (CF−7028)% 80, p2 = p1, s1 = s2 = S
The dvth additional diffusion vector of subset 3 (FIG. 8) is used.
dv = ((CF−7028) −p1) ÷ 80
(5) When CF is 7348 or more and less than 7664
p1 = (CF-7348)% 79, p2 = p1 + 1, s1 = S, s2 = -S
The dvth additional diffusion vector of subset 4 (FIG. 9) is used.
dv = ((CF-7348) -p1) ÷ 79
(6) When CF is 7664 or more and less than 7975
p1 = (CF-7664)% 78, p2 = p1 + 2, s1 = S, s2 = -S
The dvth additional diffusion vector of subset 5 (FIG. 10) is used.
dv = ((CF-7664) -p1) ÷ 78
As described above, the position p1 and polarity s1 of the pulse 1, the position p2 and polarity s2 of the pulse 2, and the diffusion vector information to be applied are decoded.
FIG. 14 is a block diagram showing another configuration of the fixed excitation codebook.
Fixed excitation codebook 207 in FIG. 14 has two fixed

excitation codebook subsets

608 and 609. The first fixed excitation codebook subset 608 is composed of three blocks: a first pulse excitation codebook 601, a spread vector storage 602, and a spread vector convolution processor 603. The first pulse excitation codebook 601 is a excitation codebook that generates a predetermined pulse excitation vector (for example, a vector composed of two pulses). The spread vector storage 602 is a storage for storing a spread vector designed exclusively for the pulse excitation codebook 601. The spread vector convolution processor 603 is a convolution processor that convolves the pulse excitation vector output from the first pulse excitation codebook 601 with the diffusion vector output from the diffusion vector storage 602.
Similarly, the second fixed excitation codebook subset 609 is a second pulse excitation codebook 604 (for example, the second pulse excitation codebook 604 is different from the first pulse excitation codebook 601 from three or five pulses. A diffusion vector storage unit 605 and a diffusion vector convolution processing unit 606.
Here, the spread vector store in each fixed excitation codebook subset is designed exclusively for the pulse excitation codebook of each subset, and stores a different spread vector between the subsets.
In the present embodiment, the number of subsets of the fixed excitation codebook is two. However, in the present invention, the number is not limited, and the same effect can be obtained with three or more.
Further, the pulse excitation codebooks in each subset may have different numbers of excitation pulses included in the excitation vector, or a pattern of excitation pulses (for example, a certain excitation pulse codebook generates only combinations of excitation pulses that are close to each other). Another sound source pulse codebook may generate only combinations of sound source pulses separated from each other).
In any case, when a sound source vector having different characteristics / features is generated for each subset, the degree of performance improvement is high. The changeover switch 607 is a switch for selecting one of the fixed excitation vectors output from the diffusion vector convolution processor 603 or the diffusion vector convolution processor 606.
This fixed excitation codebook is generated by the first fixed excitation codebook subset 608 or the second fixed excitation codebook subset 609 with the fixed excitation vector specified by the signal (F) input from the parameter determination unit 212. The fixed sound source vector is output via the switch 607.
FIG. 15 is a flowchart showing a processing procedure for searching the fixed excitation codebook of FIG.
First, in ST701, a first fixed excitation codebook subset search is performed, and a fixed excitation vector that minimizes the quantization error is selected.
Next, a second fixed excitation codebook subset search is performed in ST702, and if there is a fixed excitation vector that further reduces the quantization error than the fixed excitation vector selected in ST701, that is used as the final fixed excitation vector. select.
ST701 and ST702 differ only in that different spreading vectors are applied to different fixed excitation codebooks, and the specific search method is the same as that of the above-described conventional technique. The different fixed excitation codebooks are prepared so that the characteristics of the generated excitation code vectors are different (for example, the number of excitation pulses is different).
For example, the first fixed excitation codebook subset generates a excitation vector composed of two excitation pulses, and the second fixed excitation codebook subset generates a fixed excitation vector generated from five excitation pulses. Thus, fixed excitation codebook subsets having different excitation pulse numbers are prepared. Alternatively, the first fixed excitation codebook subset generates a fixed excitation vector with a combination of excitation pulses close to each other, and the second fixed excitation codebook subset has a plurality of excitation pulses dispersed throughout the vector. (For example, both the first fixed excitation codebook subset and the second fixed excitation codebook subset generate excitation vectors having the same number of pulses, but the first fixed excitation codebook subset is A fixed excitation codebook vector in which all pulses are arranged within a range of a predetermined number of samples M (for example, 2 to 10 samples) is generated, and all excitation pulse intervals are predetermined in the second fixed excitation codebook subset. A combination of sound source pulses such that a fixed sound source vector having a number of samples M ′ (for example, 10 samples) or more is generated) Prepare a fixed excitation codebook subset.
In this way, by applying a dedicated diffusion vector to a sound source vector having a specific shape that is frequently used, the quality of the restored speech can be improved efficiently. Alternatively, the quality of the restored speech can be improved efficiently by applying different diffusion vectors according to the characteristics of the pulsed sound source vector.
Note that the increase in the number of diffusion vector patterns is hardly a problem, and the design of the diffusion vector pattern is not a problem if a configuration in which a plurality of dedicated diffusion vectors are prepared only for a pulsed sound source vector of a specific shape that is frequently used. There is almost no problem.
On the other hand, the quality of the restored speech can be improved extremely effectively (efficiently). In other words, it is a wasteful process to prepare a large number of diffusion vectors that are not useful for improving the actual sound quality. In the present invention, a small amount of dedicated diffusion patterns (additional diffusion vectors) are added to efficiently improve the sound quality. The effect that can be obtained.
The fixed excitation codebook described above can be realized by hardware, as well as storing necessary vector data in a database, and using that data to generate waveform data of fixed excitation vectors as appropriate. This can also be realized.
(Embodiment 2)
Conventionally, a digital filter having a high-frequency emphasis function has been provided in a portion that performs signal processing after a synthesis filter. However, this filter is generally a high-pass filter expressed by a first-order digital filter, for example, J-H. Chen and A.M. Gersho, “Adaptive Postfiltering for Quality Enhancement of Coded Speech”, IEEE Trans. Speech & Audio Processing, Vol. 3, No. 1, Jan. 1995.
On the other hand, the feature of the present embodiment is that an original high-frequency emphasis process is performed on the signal before passing through the synthesis filter on the speech decoding side.
FIG. 16 is a block diagram showing a configuration of speech decoding apparatus 111 in FIG.
In FIG. 16, the encoded information output from the RF demodulator 110 is separated into individual encoded information by the multiplexing / separating unit 801. The separated LPC code (L) is output to the LPC decoding unit 802, the separated adaptive excitation vector code (A) is output to the adaptive excitation codebook 805, and the separated excitation gain code (G) is quantized. The fixed excitation vector code (F) output to gain generation section 806 and separated is output to fixed excitation codebook 807.
The LPC decoding unit 802 decodes the LPC from the code (L) output from the demultiplexing unit 801 and outputs the LPC to the synthesis filter 803. The adaptive excitation codebook 805 extracts a sample for one frame from the past driving excitation signal samples specified by the code (A) output from the demultiplexing unit 801 as an adaptive excitation vector, and outputs it to the multiplier 808.
The quantization gain generation unit 806 decodes the adaptive excitation vector gain and the fixed excitation vector gain specified by the excitation gain code (G) output from the demultiplexing unit 801, and outputs them to the

multipliers

808 and 809.
The fixed excitation codebook 807 generates a fixed excitation vector specified by the code (F) output from the demultiplexing unit 801 and outputs the fixed excitation vector to the multiplier 809.
Multiplier 808 multiplies the adaptive excitation vector by the adaptive excitation vector gain and outputs the result to adder 810. Multiplier 809 multiplies the fixed excitation vector by the fixed excitation vector gain and outputs the result to adder 810.
The adder 810 adds the adaptive excitation vector after gain multiplication output from the

multipliers

808 and 809 and the fixed excitation vector, generates a driving excitation vector, and outputs it to the high frequency emphasis unit 811.
The high-frequency emphasizing unit (high-frequency emphasizing post filter) 811 performs a unique high-frequency emphasizing process on the driving sound source vector (for example, high-frequency emphasizing processing in which the degree of amplitude emphasis becomes higher as the frequency component becomes higher). And the high-frequency emphasized signal is output to the synthesis filter 803. Details of the high frequency emphasis unit 811 will be described later.
The synthesis filter 803 performs filter synthesis using the filter vector decoded by the LPC decoding unit 802 using the excitation vector output from the high frequency emphasizing unit 811 as a drive signal, and sends the synthesized signal to the post-processing unit 804. Output.
The post-processing unit 804 performs a process for improving the subjective quality of speech such as formant enhancement and pitch enhancement, a process for improving the subjective quality of stationary noise, and the like as a final decoded speech signal. The data is output to the D / A converter 112.
Next, the high-frequency emphasis process will be specifically described with reference to FIG.
In general, CELP encoding tends to attenuate high frequency components of a decoded signal. In particular, since the tendency becomes large at a low bit rate, it is possible to improve the subjective quality to some extent by enhancing the high frequency component of the decoded signal.
In the high frequency emphasis unit (high frequency emphasis post filter) 811 in FIG. 17, the sound source vector is input to a high pass filter (HPF) 901, an adder 902 and an adder 903.
The high-pass filter 901 serves to extract a band component to be emphasized. The components of the driving excitation vector that are higher than the cutoff frequency of the high-pass filter 901 are output to the adder 903, logarithmic power calculator 904, and multiplier 906.
The adder 903 subtracts the high frequency component of the sound source vector from the sound source vector and outputs the result to the logarithmic power calculator 905.
The logarithmic power calculator 904 calculates the logarithmic power of the high frequency component of the sound source vector and outputs it to the power ratio calculator 907. The logarithmic power calculator 905 calculates the logarithmic power of the signal obtained by removing the high frequency component from the sound source vector, and outputs it to the power ratio calculator 907.
The power ratio calculator 907 calculates a logarithmic power ratio between the high frequency component of the sound source vector and the other components, and outputs it to the enhancement coefficient calculator 908.
The enhancement calculator 908 calculates a coefficient (enhancement coefficient Rr) to be multiplied by the high frequency component of the sound source vector so that the logarithmic power ratio is basically constant.
Specifically, when the signal output from the logarithmic power calculator 904 is Eh [i] and the signal output from the logarithmic power calculator 905 is El [i], the logarithmic power output from the power ratio calculator 907 is calculated. The ratio R is expressed by the following equation (1), where L is the subframe length.

Therefore, the enhancement calculator 908 obtains the coefficient Rr as the ratio between Cr and R (logarithmic power ratio) by the following equation (2) in order to set the logarithmic power ratio R to a constant value Cr (for example, 0.42). .

The limiter 909 sets an upper limit value (for example, 0) and a lower limit value (for example, 0.3) of the coefficient Rr. When the value of the coefficient Rr calculated by the enhancement calculator 908 is larger than the upper limit value, the limiter 909 sets the coefficient Rr. The upper limit value is set, and if it is smaller than the lower limit value, the coefficient Rr is set as the lower limit value.
The smoothing circuit 910 smoothes the value of the enhancement coefficient Rr temporally (between samples or / and between subframes) so that the value of the enhancement coefficient Rr smoothly changes between subframes or samples.
Specifically, first, as shown in the following formula (3), the logarithmic power ratio is returned to the linear region and 1 is subtracted. This is because the high frequency component is added to the original sound source signal (from the adder 810) that has not been subtracted, so only the portion exceeding 1.0 is desired to be added.

Then, smoothing is performed according to the following equation (4) so that Rr1 changes smoothly between (sub) frames. Note that the smoothing coefficient α is set to such an extent that the smoothing is not so strong (for example, α = 0.3).

Furthermore, when the smoothed enhancement coefficient Rr1 ′ is multiplied by the output signal exh [i] of the high-pass filter 901 and added to the sound source vector ex [i], Rr1 ′ is calculated for each sample by the following equation (5). Smoothed to Rr1 ″. Note that this smoothing process is strong (for example, β = 0.9).

The multiplier 906 multiplies the high-frequency component exh [i] of the sound source vector, which is an output from the high-pass filter 901, by the enhancement coefficient Rr1 ″ smoothed by the smoothing circuit 910.
The adder 902 adds the high frequency component signal Rr1 ″ × exh [i] of the excitation vector obtained by multiplying the excitation vector exn [i] by the smoothed coefficient, and outputs the result to the synthesis filter 803.
Note that exn [i] may be output to the synthesis filter 803 as it is, but it is more common to perform scaling processing so as to have the same energy as the original sound source vector ex [i]. Such a scaling process may be performed after the adder 902, or Rr1 ″ may be calculated in consideration of the scaling process. In the latter case, an input line from the high pass filter 901 to the smoothing circuit 910 is required. In the former case, a scaling processing unit is inserted between the adder 902 and the synthesis filter 803, and a sound source vector (from the adder 810) and a high-frequency emphasized sound source vector (from the adder 902) are input to the scaling processing unit. Will be.
Specific processing is as follows.
(When performed after the adder 902)

(When scaling processing is included in Rr1 ″)

The characteristics of the high-pass filter 901 are adjusted so that the subjective quality of the decoded speech signal is the best. Specifically, when the sampling frequency is 8 kHz, it is preferable to use a secondary IIR filter with a cutoff frequency of around 3 kHz. In the embodiment of the present invention, the cut-off frequency can be freely designed according to the excitation signal encoding characteristics of the encoding device. Further, the order of the high-pass filter can be freely designed according to the required filter characteristics and the allowable calculation amount.
In this way, by performing high-frequency emphasis processing using a digital filter with a unique transfer function, it is possible to compensate for gain reduction in the high-frequency region of the excitation signal and realize a flat characteristic. Effective original filter characteristics can be realized, and the quality of the restored speech can be effectively improved. For example, by performing high-frequency emphasis, it is possible to prevent the restored speech from becoming a subjective quality with a feeling of muddyness.
Further, it is easy to provide the high-frequency emphasis post filter before the synthesis filter, and it is easy to apply the present invention to an actual product.
As described above, according to the present invention, it is possible to efficiently improve the quality of restored speech by adding a minimum amount of hardware or the like. Further, according to the present invention, it is possible to improve the performance of a fixed excitation codebook having a pulse spreading structure. In addition, it is possible to effectively compensate for the high frequency attenuation of the excitation vector in CELP encoding and improve the subjective quality.
The fixed vector generation method, CELP speech encoding method, or CELP speech decoding method of the present invention is implemented by installing a program from a communication line or CD or other storage medium and executing it by a control means such as a CPU. Each can be realized.
This specification is based on Japanese Patent Application No. 2002-043878 filed on Feb. 20, 2002. This content is included here.
Industrial applicability
The present invention is suitable for use in a CELP speech coding apparatus or a CELP speech decoding apparatus.
[Brief description of the drawings]
FIG. 1 is a block diagram showing an example of a configuration of a fixed excitation codebook having a conventional pulse spreading structure,
FIG. 2 is a diagram showing an outline of the overall configuration of an audio signal transmitting device and an audio signal receiving device in the present invention;
FIG. 3 is a block diagram showing the configuration of the speech encoding apparatus according to Embodiment 1 of the present invention.
FIG. 4 is a block diagram showing a configuration of a fixed excitation codebook according to Embodiment 1 of the present invention;
FIG. 5A is a diagram showing a distribution of usage frequencies of pulsed sound source vectors according to Embodiment 1 of the present invention;
FIG. 5B is a diagram showing a distribution of the frequency of use of pulsed sound source vectors according to Embodiment 1 of the present invention;
FIG. 6 is a diagram showing an example of an additional diffusion vector according to Embodiment 1 of the present invention;
FIG. 7 is a diagram showing an example of an additional diffusion vector according to Embodiment 1 of the present invention.
FIG. 8 is a diagram showing an example of an additional diffusion vector according to Embodiment 1 of the present invention,
FIG. 9 is a diagram showing an example of an additional diffusion vector according to Embodiment 1 of the present invention;
FIG. 10 is a diagram showing an example of an additional diffusion vector according to Embodiment 1 of the present invention;
FIG. 11 is a diagram showing an example of a basic diffusion vector according to Embodiment 1 of the present invention.
FIG. 12 is a diagram for specifically explaining the content of the selection process of the diffusion vector store according to the first embodiment of the present invention;
FIG. 13 is a flowchart showing a processing procedure of the fixed excitation codebook according to Embodiment 1 of the present invention.
FIG. 14 is a block diagram showing another configuration of the fixed excitation codebook according to Embodiment 1 of the present invention;
FIG. 15 is a flowchart showing a processing procedure when searching for a fixed excitation codebook according to Embodiment 1 of the present invention.
FIG. 16 is a block diagram showing the configuration of the speech decoding apparatus according to Embodiment 2 of the present invention, and
FIG. 17 is a block diagram showing a configuration of a high frequency emphasizing unit according to Embodiment 2 of the present invention.

Claims

A fixed excitation vector generation method for generating a fixed excitation vector required in a CELP speech encoding apparatus or a CELP speech decoding apparatus by convolving a diffusion vector with a pulse excitation vector,
A method of generating a fixed sound source vector by preparing a plurality of diffusion vectors, selecting an optimal diffusion vector shape according to the shape of the sound source vector, and generating a fixed sound source vector by convolving the selected diffusion vector with the sound source vector.

In claim 1,
A basic diffusion vector that is commonly used for the pulsed sound source vector and an additional diffusion vector that is used for a vector having a predetermined shape are prepared, and fixed using the basic diffusion vector or the additional diffusion vector. A method of generating a fixed sound source vector for generating a sound source vector.

A fixed excitation codebook that generates a fixed excitation vector by convolving a diffusion vector with a pulse excitation vector,
A fixed excitation codebook comprising means for selecting an optimum diffusion vector shape from a plurality of diffusion vectors according to the shape of the excitation vector, and means for convolving the selected diffusion vector with the excitation vector.

In claim 3,
A diffusion vector storage for storing a basic diffusion vector commonly used for the pulsed sound source vector and an additional diffusion vector used for a vector having a predetermined shape;
A fixed excitation codebook that generates a fixed excitation vector using the basic spreading vector or the additional spreading vector.

In claim 4,
A pulse sound source vector shape determiner is provided, and the fixed sound source using the additional diffusion vector only when the shape determiner determines that the pulse sound source vector has the predetermined shape. Fixed excitation codebook for generating vectors.

In claim 3,
Dedicated to each of at least two types of pulse excitation codebooks, which output excitation vectors consisting of different numbers of pulses or different combinations of positions where pulses can be generated, and each of the pulse excitation codebooks A fixed excitation codebook having a diffusion vector storage unit for storing a designed diffusion vector.

A CELP speech coding apparatus having a fixed excitation codebook,
The fixed excitation codebook generates a fixed excitation vector by means of selecting an optimum spreading vector shape from a plurality of spreading vectors according to the shape of the excitation vector and convolving the selected spreading vector with the excitation vector Means.

A CELP speech decoding apparatus for receiving a sound source gain code, an adaptive excitation vector code, and a fixed excitation vector code transmitted from the CELP speech encoding apparatus according to claim 7, and decoding speech.
Quantization gain generating means for decoding the adaptive excitation vector gain and fixed excitation vector gain specified by the excitation gain code, and adapting one frame sample from the past driving excitation signal samples specified by the adaptive excitation vector code An adaptive excitation codebook extracted as an excitation vector, a fixed excitation codebook for generating a fixed excitation vector specified by the fixed excitation vector code, a value obtained by multiplying the adaptive excitation vector by the adaptive excitation vector gain, and the fixed excitation vector Driving excitation vector generation means for generating a driving excitation vector by adding a value multiplied by the fixed excitation vector gain to the above, high frequency enhancement means for performing high frequency enhancement processing on the driving excitation vector, and the high frequency A synthesis filter that performs filter synthesis using the filter coefficient on the driving sound source vector output from the enhancement means CELP type speech decoding apparatus having a data.

In claim 8,
The high-frequency emphasizing means includes a high-pass filter that passes the high-frequency component of the drive excitation vector, a first logarithmic power calculator that calculates logarithmic power of the drive excitation vector after passing through the high-pass filter, and a high-pass filter after passing through the high-pass filter An adder that performs a process of subtracting the driving excitation vector from the driving excitation vector before passing through the high-pass filter, and a second logarithmic power calculator that calculates the logarithmic power of the driving excitation vector after removal of the high frequency component calculated by the adder A power ratio calculator for calculating a ratio of logarithmic powers calculated by the two logarithmic power calculators, and a coefficient value multiplied by the driving sound source vector after passing through the high-pass filter so that the power ratio becomes a constant value. A coefficient calculator for calculating
A CELP speech decoding apparatus that performs high-frequency emphasis processing by multiplying a signal component that has passed through the high-pass filter by a coefficient calculated by the coefficient calculator and adding the result to the driving excitation vector.

A program for generating a fixed sound source vector by convolving a diffusion vector with a pulse sound source vector,
A program comprising a step of selecting an optimal diffusion vector shape from a plurality of diffusion vectors according to the shape of a sound source vector, and a step of convolving the selected diffusion vector with the sound source vector.