JP3731575B2

JP3731575B2 - Encoding device and decoding device

Info

Publication number: JP3731575B2
Application number: JP2002306411A
Authority: JP
Inventors: 正之西口; 淳松本
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2002-10-21
Filing date: 2002-10-21
Publication date: 2006-01-05
Anticipated expiration: 2021-01-05
Also published as: JP2003216189A

Abstract

<P>PROBLEM TO BE SOLVED: To perform conversion of the number of pieces of data while drastically reducing operation quantity in the case of conversion of the number of pieces of data in an encoder. <P>SOLUTION: Non-linear compression is performed to the variable number of pieces of data by every inputted block by a non-linear compression part 12, both ends of a spectrum envelope are expanded by a spectrum envelope expansion part 14, FIR (far infrared) filtering (operation) is performed by an FIR filter 15, linear interpolation is performed by linear interpolation 16 and the data is converted into the fixed number of pieces of sample data. <P>COPYRIGHT: (C)2003,JPO

Description

【０００１】
【産業上の利用分野】
本発明は、符号化装置及び復号装置に関し、特に、音声合成分析装置（ボコーダ）等において算出されたスペクトルの振幅データのような可変個数のデータを一定個数のデータに変換するようなデータ数変換を伴う符号化装置及び復号装置に関する。
【０００２】
【従来の技術】
オーディオ信号（音声信号や音響信号を含む）の時間領域や周波数領域における統計的性質と人間の聴感上の特性を利用して信号圧縮を行うような符号化方法が種々知られている。この符号化方法としては、大別して時間領域での符号化、周波数領域での符号化、分析合成符号化等が挙げられる。
【０００３】
音声信号等の高能率符号化の例として、ＭＢＥ（Multiband Excitation: マルチバンド励起）符号化、ＳＢＥ（Singleband Excitation:シングルバンド励起）符号化、ハーモニック（Harmonic）符号化、ＳＢＣ（Sub-band Coding:帯域分割符号化）、ＬＰＣ（Linear Predictive Coding: 線形予測符号化）、あるいはＤＣＴ（離散コサイン変換）、ＭＤＣＴ（モデファイドＤＣＴ）、ＦＦＴ（高速フーリエ変換）等において、スペクトル振幅やそのパラメータ（ＬＳＰパラメータ、αパラメータ、ｋパラメータ等）のような各種情報データを量子化する場合に、従来においてはスカラ量子化を行うことが多い。
【０００４】
【発明が解決しようとする課題】
ところで、ビットレートを例えば３〜４ｋbps 程度にまで低減し、量子化効率を更に向上させようとすると、スカラ量子化では量子化雑音（歪み）が大きくなってしまい、実用化が困難であった。そこで、これらの符号化の際に得られる時間軸データや周波数軸データやフィルタ係数データ等を個々に量子化せず、複数個のデータを組（ベクトル）にまとめて一つの符号で表現して量子化するベクトル量子化が注目されている。
【０００５】
しかしながら、上記ＭＢＥ、ＳＢＥ、ＬＰＣ等のスペクトル振幅データ等は、ピッチに依存して個数が変化するため、そのままベクトル量子化しようとすると可変次元のベクトル量子化が必要となり、構成が複雑化するのみならず、良好な特性を得ることが困難である。
【０００６】
また、量子化の前にデータのブロック（フレーム）間差分をとるような場合にも、前後のブロック（フレーム）内のデータの個数が一致していないと、差分をとることができない。このように、可変個数のデータを一定個数に変換することがデータ処理の過程で必要とされることがあるが、特性の良好なデータ数変換が望まれる。
【０００７】
そこで、本出願人は、特願平４−９２２６３号特許出願の明細書及び図面において、可変個数のデータを一定個数に変換することができ、端点でリンキング等の発生しない特性の良好なデータ数変換が行えるようなデータ数変換方法を提案した。この方法は、ブロック毎に可変個数のデータを非線形圧縮部で非線形圧縮し、ダミーデータ付加部でブロック内の最後のデータ値から最初のデータ値までの補間をするようなダミーデータを付加してデータ個数を拡大した後、高速フーリエ変換（ＦＦＴ）処理部、逆高速フーリエ変換（ＩＦＦＴ）処理部等を有した帯域制限型のオーバーサンプリング部でオーバーサンプルし、直線補間部で直線補間し、間引き処理部で間引くことにより一定個数のサンプルデータに変換するものである。
【０００８】
この出願によるデータ数変換方法では、ＦＦＴをする際に、１ブロックを例えば２５６サンプルに延長して計算している。次に、例えば８倍のオーバーサンプリングを実現するために、ＦＦＴ変換により得られた２５６サンプルのスペクトルデータに対し、各サンプルの中間に７（＝８−１）個の０を詰めるような中間０詰め処理を行って２０４８サンプルとし、この２０４８サンプルに対してＩＦＦＴの計算を行っている。
【０００９】
ところで、通常のＦＦＴ、ＩＦＦＴでは、１ブロックのサンプル数をＮとするとき、（Ｎ／２×log_２Ｎ）の複素乗算と、（Ｎlog_２Ｎ）の複素加算が行われている。ここで、（Ｎ／２log_２Ｎ）の複素乗算は、（Ｎ／２×log_２Ｎ×４）の実数乗算となり、（Ｎlog_２Ｎ）の複素加算は、（Ｎlog_２Ｎ×２）の実数加算となる。したがって、Ｎを２５６としたときのＦＦＴの演算量は、４０９６回（＝２５６／２×８×４）となり、Ｎ＝２０４８としたときのＩＦＦＴの演算量は、４５０５６回（＝２０４８／２×１１×４）となり、その合計は４９１５２回となる。
【００１０】
また、全実数入力に対して、Ｎ／２点のＦＦＴで、Ｎ点ＦＦＴが実現できる、いわゆる高速化の手法を用いたとしても、Ｎ／４（log_２Ｎ−１）×４＋Ｎ×４の実数乗算と、Ｎ／２（log_２Ｎ−１）×２＋Ｎ×２の実数加算が必要となる。すなわち、Ｎ＝２５６としたときのＦＦＴでは、乗算が２８１６回、加算が２３０４回行われる。また、Ｎ＝２０４８としたときのＩＦＦＴでは、乗算が２８６７２回、加算が２４５７６回行われる。したがって、乗算だけでも３１４８８回の演算が必要となる。
【００１１】
なお、以上はエンコードの際において、ブロック（フレーム）内で可変個数（８〜６３個）のサンプルデータを一定個数（４４個）のサンプルデータに変換するデータ数（サンプルレート）変換を想定しているが、デコードの場合も同様な方法でブロック（フレーム）内の一定個数（４４個）のサンプルデータを可変個数（８〜６３個）のサンプルデータ変換しているものである。
【００１２】
ところで、実際に求めたい点の数は、エンコードの際には２０４８点でＩＦＦＴした内の約４４点程であり、また、デコードの際を考慮しても、最終的に得たいサンプル数は最大でも６３個程度であり、このような間引かれた演算を行うという性質が生かされていなかった。
【００１３】
本発明は、このような実情に鑑みてなされたものであり、演算量を低減しながらも、エンコードの際には可変個数のデータを一定個数に変換することができ、またデコードの際には一定個数のデータを可変個数のデータに変換することができるようなデータ数変換を用いた符号化装置及び復号装置の提供を目的とする。
【００１５】
【課題を解決するための手段】
本発明に係る符号化装置は、入力オーディオ信号をブロックに分割して、ブロック内の可変個数の波形データ又は波形を表すパラメータデータを抽出し、上記抽出された可変個数の系列データをブロック毎に一定の個数の基準データと比較するために上記可変個数の系列データを上記一定個数の系列データに変換して符号化する符号化装置であって、複数の係数セットを記憶する記憶手段と、上記可変個数の系列データに対し当該系列の両端にデータを付加して所定の一定個数のデータからなる新たな系列データを生成し、上記一定個数のデータの各位置に対応する係数セットを上記記憶手段から選択し、選択された係数セットに含まれる複数の係数それぞれに対し、当該係数毎に対応付けられる上記新たな系列データとを掛け合わせ、掛け合わせることによって算出された複数の値を加算することにより中間的な出力データを求める手段と、上記中間的な出力データを補間して必要とされる一定個数の系列データを求める手段とを有することにより、上述の課題を解決する。
【００１７】
また、本発明に係る復号装置は、入力オーディオ信号をブロックに分割して、ブロック内の可変個数の波形データ又は波形を表すパラメータデータを抽出し、上記抽出された可変個数の系列データをブロック毎に一定の個数の基準データと比較するために上記可変個数の系列データを上記一定個数の系列データに変換することにより符号化された符号列を受け取り、上記符号列から上記一定個数の系列データを復号化し、上記復号化された一定個数の系列データから可変個数の系列データに逆変換する復号装置であって、複数の係数セットを記憶する記憶手段と、上記復号化された一定個数の系列データに対し当該系列の両端にデータを付加して所定の一定個数のデータからなる新たな系列データを生成し、上記一定個数のデータの各位置に対応する係数セットを上記記憶手段から選択し、選択された係数セットに含まれる複数の係数それぞれに対し、当該係数毎に対応付けられる上記新たな系列データとを掛け合わせ、掛け合わせることによって算出された複数の値を加算することにより中間的な出力データを求める手段と、上記中間的な出力データを補間して必要とされる可変個数の系列データを求める手段とを有することにより、上述の課題を解決する。
【００１８】
【実施例】
以下、本発明に係る符号化装置及び復号装置の実施例について、図面を参照しながら説明する。
【００１９】
図１は本発明の第１の実施例となる符号化装置に用いられるデータ数変換の概略構成を示している。この第１の実施例は後述するＭＢＥボコーダに適用される。すなわち、ＭＢＥボコーダにより算出されたスペクトルエンベロープの個数が可変とされた振幅データを一定個数に変換する方法である。
【００２０】
図１において、入力端子１１には、後述するＭＢＥボコーダにより算出されたスペクトルエンベロープの振幅データ等が供給されている。この振幅データは、例えば図２のＡに示すようなスペクトルを有する音声信号を分析して、ピッチ周波数（角周波数）ωを求め、このピッチ周波数ωに応じたスペクトルの周期性を考慮して、各高調波（ハーモニクス）位置での振幅から、図２のＢに示すようなスペクトル包絡（エンベロープ）を表す振幅データとして求められる。この振幅データの個数は一定の有効帯域（例えば２００〜３４００Ｈｚ）内でピッチ周波数ωに依存して変化する。そこで、図２のＣに示すように一定の固定周波数（角周波数）ω_ｃの各高調波位置での上記スペクトル包絡の振幅データを求めることで、データ個数を一定にできる。
【００２１】
図１の例では、入力端子１１からの可変数Ｍ個（例えばＭ＝８〜６３である）の入力データを、非線形圧縮部１２にて例えばｄＢ領域に圧縮（対数圧縮）した後、データ個数変換本体部１３にて一定個数のデータに変換している。データ個数変換本体部１３は、スペクトルエンベロープ拡張部１４、帯域制限型ＦＩＲフィルタ１５及び直線補間部１６から成っている。
【００２２】
入力されたブロック毎の可変数Ｍ個の入力データは、非線形圧縮部１２で非線形圧縮され、スペクトルエンベロープ拡張部１４でスペクトルエンベロープの両端の値を繰り返して前後に延長される。この両端が前後に延長されたスペクトルエンベロープは、ＦＩＲフィルタ１５に供給される。このＦＩＲフィルタ１５は入力データのサンプル点に対してそれぞれ異なる複数の位相と対応した複数の係数セットの内の上記一定個数のデータの各位置の近傍の位置に対応する係数セットを用いることにより、中間的な出力データを求める。この中間的な出力データは、直線補間部１６に供給され、直線補間されて最終出力に必要とされる一定個数のデータとなり、出力端子１７から出力される。
【００２３】
ここで、後述するＭＢＥボコーダにおいて算出されるＭ個（ｍ_ＭＸ＋１個）の振幅データ列をａ〔ｍ〕とする。ｍは上記高調波（ハーモニックス）の次数あるいはバンド番号であり、ｍ_ＭＸが最大値であるが、ｍ＝０のバンドの振幅データも含めて、全バンドの振幅データの個数はｍ_ＭＸ＋１個となる。この振幅データａ〔ｍ〕を、非線形圧縮部１２にて例えばｄＢ領域に変換する。すなわち得られたデータをａ_ｄＢ〔ｍ〕とするとき、
ａ_ｄＢ〔ｍ〕＝２０ log_１０ａ〔ｍ〕・・・（１）
である。この対数変換された振幅データａ_ｄＢ〔ｍ〕の個数ｍ_ＭＸ＋１は、上述したようにピッチに依存して変化するため、一定個数の振幅データｂ〔ｍ〕に変換する。これは一種のサンプリングレート（サンプルレート）変換である。なお、非線形圧縮部１２での圧縮処理は、ｄＢ領域への対数圧縮の他に、例えばいわゆるμ-lawやα-lawのような疑似対数圧縮処理を施してもよい。このように、振幅を圧縮することにより、能率的な符号化が実現される。
【００２４】
ＭＢＥボコーダに入力される時間軸上の音声信号に対するサンプリング周波数ｆs は、通常８ｋHzで、全帯域幅は３．４ｋHz（ただし有効帯域は２００〜３４００Hz）であり、女声の高い方から男声の低い方までのピッチラグ（ピッチ周期に相当するサンプル数）は、２０〜１４７程度である。従って、ピッチ（角）周波数ωは、8000/147≒５４（Hz）から 8000/20＝４００（Hz）程度までの間で変動することになる。従って、周波数軸上で上記３．４ｋHzまでの間に約８〜６３本のピッチパルス（ハーモニックス）が立つことになる。すなわち、周波数軸上のｄＢ領域の波形として、８サンプル乃至６３サンプルから成るｍ_ＭＸ＋１個のデータを、一定のサンプル数、例えば４４サンプルに、サンプル数変換を行うわけである。これが、図２のＣに示すように、一定のピッチ周波数（角周波数）ω_Ｃ毎のハーモニックスの位置のサンプルを求めることに相当する。
【００２５】
次にスペクトルエンベロープ拡張部１４は、上述したように非線形圧縮部１２で非線形圧縮され、ａ_ｄＢ〔ｍ〕の配列で表せるｍ_ＭＸ＋１個のスペクトルエンベロープの両端の値を前後に延長する。これはスペクトルエンベロープの端点におけるリンギングの発生を防ぐために行われる。このようにしてできた数列をａ_ＪｄＢ〔ｍ〕とすると、このａ_ＪｄＢ〔ｍ〕は−（ｆ_０ −１）／２≦ｍ＜Ｍ＋（ｆ_０ −１）／２の範囲で、
【００２６】
【数１】

【００２７】
となる。ここでｆ_０は例えば９で、次に使用するＦＩＲフィルタの（オーバーサンプリング後のサンプリングレートでみた）次数Ｆ_０例えば６５と、Ｆ_０＝Ｏ_Ｓ ×（ｆ_０ −１）＋１という関係にある定数である。また、ｆ_０ −１は、このスペクトルエンベロープ拡張を一種のオーバーサンプリングと考えたとき、オーバーサンプリングする前のサンプリングレートでみたときのフィルタ次数であり、Ｆ_０は、オーバーサンプリング後のサンプリングレートでみたときのフィルタの次数である。また、Ｏ_Ｓは、オーバーサンプリングの比率（レシオ) である。図３はこのａ_ＪｄＢ〔ｍ〕を示す図である。すなわち、このａ_Ｊ _ｄＢ〔ｍ〕は、０≦ｍ＜Ｍの区間に示される元の波形ａ_ｄＢ〔ｍ〕の左端Ｆ_０をａ_ｄＢ〔０〕のまま−（ｆ_０ −１）まで延長し、右端部を最後のデータであるａ_ｄＢ〔Ｍ−１〕のままＭ＋（ｆ_０ −１）／２まで延長している。
【００２８】
本来、帯域制限型のオーバーサンプリングは、例えば（Ｏ_Ｓ −１）個のデータを０詰めしたものにＦ_０次のフィルタを通したものとしてもよいが、０データに対する積和は無視してよい。そのため、帯域制限型のオーバーサンプリングは、８つの位相の係数セット（Ｐ＝０・・・７）の各セット毎に（ｆ_０ −１）個の係数からなり、もとのＦ_０個の係数をオーバーサンプリングするフィルタ処理とみることができる。
【００２９】
図４はＦ_０を６５、ｆ_０を９、Ｏ_Ｓを８とした場合のＦＩＲフィルタ１５の位相の係数を示す図である。図４のＡは、−４πから４πまでの位相の変化範囲でＦ_０個（６５個）の係数の大きさを振幅値として示している。位相変化が０πのとき係数値は１であるが、位相変化が±４π、±３π、±２π、±πのときは０である。また、この図４のＡは０πの振幅を軸に左右対称となっている。図４のＢは、Ｐ＝０・・・７の各位相の係数セットの持つ係数値が図４のＡのどこにあたるかを示している。なお、この係数値は、周知の方法によって導出できる。
【００３０】
ここで、本発明においては、上記入力データのサンプル点に対してそれぞれ異なる複数の位相と対応した複数の係数セットの内の上記一定個数のデータの各位置に対応する係数セットを用いることにより、出力として必要なデータそのもの、あるいは必要とされるデータの近傍のデータを求めることにより、演算自体を間引いて演算量を減らすものである。
【００３１】
図５は、図４に示された係数セット（Ｐ＝０・・・７）を用いてａ_ＪｄＢ〔ｍ〕をフィルタリングし、出力として必要とされる一定個数のデータｂ〔ｍ〕の内の任意の１個を得るための演算を説明するための図である。
【００３２】
図５のＡは、ａ_ｄＢ〔ｍ〕を示す。このａ_ｄＢ〔ｍ〕からｂ〔ｍ〕を得るには、上記スペクトルエンベロープ拡張部１４でａ_ｄＢ〔ｍ〕の両端を延長して図３に示したようなａ_ＪｄＢ〔ｍ〕を先ず得る。ｉは可変数Ｍ個のデータのインデックスである。
【００３３】
例えば、図５のＡに示されたｂ点におけるｂ〔ｍ〕を求めようとする場合を以下に述べる。
【００３４】
このｂ点に最も近傍の位置にある係数値のセットはＰ＝２の係数セットである。このＰ＝２の係数セットは図５のＢに示すような各係数値を持っている。この各係数値をｐ_２０、ｐ_２１、ｐ_２２、ｐ_２３、ｐ_２４、ｐ_２５、ｐ_２７とする。すると、ｂ点のｂ〔ｍ〕は、インデックスｉ＝０のデータと係数値ｐ_２０の乗算値と、インデックスｉ＝１のデータと係数値ｐ_２１の乗算値と、インデックスｉ＝２のデータと係数値ｐ_２２の乗算値と、インデックスｉ＝３のデータと係数値ｐ_２３の乗算値と、インデックスｉ＝４のデータと係数値ｐ_２４の乗算値と、インデックスｉ＝５のデータと係数値ｐ_２５の乗算値と、インデックスｉ＝６のデータと係数値ｐ_２６の乗算値と、インデックスｉ＝７のデータと係数値ｐ_２７の乗算値との合計８個の乗算値の和として表せる。
【００３５】
今、Ｆ_０個の係数を０≦ｋ＜Ｆ_０の範囲でcoef〔ｋ〕とすると０≦ｍ＜Ｍ・Ｏ_Ｓの範囲でのｂ〔ｍ〕は、次の（３）式で示される。
【００３６】
【数２】

【００３７】
この（３）式より、例えば、上記ｂ点のｂ〔ｍ〕は、Ｏ_Ｓ＝８であれば、ｂ〔３×８＋２〕であり、ｂ〔２６〕となり、ｂ〔２６〕のサンプルデータの振幅値を求めることになる。
【００３８】
ここで、Ｏ_Ｓ＝８、ｆ_０＝９とすると上記（３）式は、
【００３９】
【数３】

【００４０】
となる。
【００４１】
ここで、Ｐ＝０のとき上記（４）式は、
【００４２】
【数４】

【００４３】
となり、ｉ＝０、１・・・７のデータの振幅をそのまま求めることになる。
【００４４】
また、Ｐ＝１・・・７のとき上記（４）式は、
【００４５】
【数５】

【００４６】
となり、ａ_ＪｄＢ〔ｉ−３〕、ａ_ＪｄＢ〔ｉ−２〕、ａ_ＪｄＢ〔ｉ−１〕、ａ_ＪｄＢ〔ｉ−０〕、ａ_ＪｄＢ〔ｉ＋１〕、ａ_ＪｄＢ〔ｉ＋２〕、ａ_ＪｄＢ〔ｉ＋３〕、ａ_ＪｄＢ〔ｉ＋４〕の８個のデータに対し、coef〔８−Ｐ〕、coef〔16−Ｐ〕、coef〔24−Ｐ〕、coef〔32−Ｐ〕、coef〔40−Ｐ〕、coef〔48−Ｐ〕、coef〔56−Ｐ〕、coef〔64−Ｐ〕の８個の係数が各々乗算され、その８個の乗算値が全て加算されてｂ〔ｍ〕が得られることが分かる。
【００４７】
例えば、上記図５のＡに示されたｂ点のｂ〔ｍ〕の例では、ｉ＝３、Ｐ＝２であるので、ａ_ＪｄＢ〔０〕、ａ_ＪｄＢ〔１〕、・・・ａ_ＪｄＢ〔７〕の８個のデータに対し、coef〔６〕、coef〔14〕、・・・coef〔62〕の８個の係数が各々乗算され、その８個の乗算値が全て加算されてｂ〔26〕が得られる。
【００４８】
また、例えば、ｉ＝０、Ｐ＝３のｂ〔ｍ〕を求める場合は、上記スペクトルエンベロープ拡張部１４で得たデータａ_ＪｄＢ〔−３〕、ａ_ＪｄＢ〔−２〕、ａ_ＪｄＢ〔−１〕の計３個のデータにそれぞれcoef〔５〕、coef〔11〕、coef〔22〕の計３個の係数を乗算した３個の乗算値と、ａ_ＪｄＢ〔０〕、ａ_ＪｄＢ〔１〕、ａ_ＪｄＢ〔２〕、ａ_ＪｄＢ〔３〕、ａ_ＪｄＢ〔４〕の計５個のデータにそれぞれcoef〔29〕、coef〔37〕、coef〔45〕、coef〔53〕、coef〔61〕の５個の係数を乗算した５個の乗算値とからなる計８個の乗算値が加算されてｂ〔３〕が得られる。
【００４９】
ここで、ａ_ＪｄＢ〔ｉ＋ｊ−（ｆ_０ −１）／２〕の〔〕内がｉ_ｍｉｎ −（ｆ_０ −１）／２＝１−（ｆ_０ −１）／２、ｉ_ｍａｘ＋（ｆ_０ −１）／２＝ｍ_ＭＸ＋（ｆ_０ −１）／２に関しては、スペクトルエンベロープ拡張部１４により、データが拡張されているので問題はない。ここで、ｂ〔ｍ〕の一点を求めるのに必要な積は（ｆ_０ −１）回である。
【００５０】
ところで、以上の説明においては、（例えば８倍の）オーバーサンプル点のいずれかの位置に上記最終的に必要とされる一定個数（例えば４４個）のデータの位置が一致するものとして説明したが、現実には、このような一致を得るためにはオーバーサンプルの比率（倍数）を極めて高くとることが必要とされ、フィルタ係数の個数が膨大なものとなることより、最終的には必要とされるデータの位置の近傍（例えば前後の２点）のオーバーサンプル点のデータを中間的な出力として上記フィルタリング演算により求め、この中間的な出力を補間処理することで、上記最終的に必要とされるデータを求めることが好ましい。
【００５１】
すなわち、上記ＦＩＲフィルタ１５からのＦＩＲ出力は、直線補間部１６に供給される。この直線補間部１６は、上記ＦＩＲフィルタ１５からの少なくとも２つのＦＩＲ出力を直線補間し、必要な出力点を得る。例えば、図６において点Ａ_０を直線補間で求めるには、その点Ａ_０を挟む２点Ａ_−１、Ａ_１がＦＩＲフィルタで算出されていればよい。したがって、データ個数変換本体部１３で求められる最終的なエンコーダでの出力点の個数を４４点とすれば、４４×２（＝８８）点が上記ＦＩＲフィルタ１５で算出されればよい。
【００５２】
この必要な出力点を得るための２点Ａ_−１、Ａ_１のｂ〔ｍ〕を求める処理について図７のフローチャートを用いて説明する。
【００５３】
ステップＳ１では、入力側の角周波数をＯ_Ｓ（サンプリングレシオ）で割ったω_０ｆと、出力側の角周波数ω_０を求める。第１の実施例では、Ｏ_Ｓ（例えば８）倍のサンプリングを行っているので、スペクトルは、入力の角周波数の１／Ｏ_Ｓのインターバルで立っている。そのため、Ｏ_Ｓで割った値ω_０ｆを出す。０〜πまでを例えば１０２４のグリッドで表現すると、このω_０ｆは、1024／Ｍ×１／Ｏ_Ｓとなる。また、欲しい点（出力側）の角周波数はω_０であり、このω_０が1024／Ｍ' となる。ここで、Ｍ' は、出力側のハーモニクスの数である。
ステップＳ２では、入力側ハーモニクスのインデックスｉ及び出力側ハーモニクスのインデックスiiを初期化する。
【００５４】
ステップＳ３では、上記係数セットＰを初期化する。
【００５５】
ステップＳ４では、入力側ハーモニクスのインデックスｉと係数セットＰとにより求めたいデータの位置Ａ_０を検索（スキャン）する。すなわち、求めたいデータの位置Ａ_０（＝ω_０ ×ii）を、ｉとＰによるスキャンの位置Ａ_１（＝ｉ×Ｏ_Ｓ＋Ｐ＋１）が越えたか否かを判定する。例えば、始めは上記ステップＳ２、Ｓ３でｉとＰが初期化されているのでｉ＝０、Ｐ＝０として検索する。ここで、ＹＥＳを判定するとステップＳ５に進み、ＮＯを判定するとステップＳ７に進む。
【００５６】
ステップＳ５では、求めたいデータの位置Ａ_０（＝ω_０ ×ii）を越えたｉとＰによるスキャンの位置Ａ_１（＝ｉ×Ｏ_Ｓ＋Ｐ＋１）でのｂ〔ｍ〕、すなわち、ｂ〔ｉ×Ｏ_Ｓ＋Ｐ＋１〕とその一つ前（Ａ_−１）のｂ〔ｉ×Ｏ_Ｓ＋Ｐ〕とを求める。このｂ〔ｉ×Ｏ_Ｓ＋Ｐ＋１〕とｂ〔ｉ×Ｏ_Ｓ＋Ｐ〕とは上記求めたいデータの位置Ａ_０（＝ω_０ ×ii）を挟み込むような位置（Ａ_１とＡ_−１の間）でのｂ〔ｍ〕となる。
【００５７】
ステップＳ６では、次に求めたいデータの位置を移動するため、出力側ハーモニクスのインデックスiiをインクリメントする。
【００５８】
ステップＳ７では、スキャンの位置を移動するために係数セットＰをインクリメントする。このときｉは０のままである。すなわち、ｉ＝０のまま、Ｐを０から１に変える。
【００５９】
ステップＳ８では、係数セットＰがＯ_Ｓの値と一致したか否かを判定する。Ｐは０・・・７までの８個であり、Ｏ_Ｓも８としている。ここで、ＹＥＳを判定するとステップＳ９に進み、ＮＯを判定するとステップＳ４に進む。
【００６０】
ステップＳ９では、入力側ハーモニクスのインデックスｉをインクリメントする。そして、ステップＳ１０に進む。
【００６１】
ステップＳ１０では、上記ｉが可変個数のデータと数（Ｍ個）と等しくなったか否かを判定する。ここでＹＥＳを判定するとこのフローは終了となり、ＮＯを判定するとステップＳ３に戻る。
【００６２】
以上のフローチャートより、本実施例はＯ_Ｓ（ここではＯ_Ｓ＝８）倍でオーバーサンプリングピッチ（角周波数）ω_０ｆのインターバルで周波数をインクリメントしてゆき出力として欲しい点を越えたところでのｂ〔ｍ〕とその一つ手前のｂ〔ｍ〕とを求めている。このようにすれば、出力点を直線補間で求めるのに必要な左右の点が全て算出されることになる。
【００６３】
次に、上記図７に示したフローチャートの処理によって求められた、出力として欲しい点を越えたところでのｂ〔ｍ〕とその一つ手前のｂ〔ｍ〕を直線補間部１６により直線する処理を図８のフローチャートを用いて説明する。
【００６４】
ステップＳ２１では、出力角周波数ω_０と、入力角周波数ω_０ｆとを求める。これは、上記図７に示したステップＳ１と同様である。
【００６５】
ステップＳ２２では、以後のフローが入力側のハーモニクスの８倍のインデックスｉでインクリメントされるのでこのｉを初期化する。
【００６６】
ステップＳ２３では、ｉ＝０になっているか否かを判別する。ここでＹＥＳを判別するとステップＳ２４に進み、ＮＯを判別するとステップＳ２５に進む。
【００６７】
ステップＳ２４、ステップＳ２５では、図６に示すようにある一つの区間に着目して、その幅をｂ_ｗとし、上限をｕ_ｂ、下限をｌ_ｂとしている。この上限ｕ_ｂは、inint （ｉ＋１）×ω_０ｆとなり、下限ｌ_ｂは、inint ｉ×ω_０ｆとなる。ここで、inint はinint （ｘ）とするとき、ｘに最も近い数を返す関数である。また、上記下限ｌ_ｂは、一回前の上限ｕ_ｂとなる。したがって、ｂ_ｗは、上限ｕ_ｂと下限ｌ_ｂとの差になる。
【００６８】
上記ステップＳ２４では、下限ｌ_ｂを０とし、ステップＳ２６に進む。
【００６９】
上記ステップＳ２５では、下限ｌ_ｂと上限ｕ_ｂとを一致させる。
【００７０】
ステップＳ２６では、上述したように上限ｕ_ｂをinint （ｉ＋１）×ω_０ｆと設定する。
【００７１】
ステップＳ２７では、上限ｕ_ｂと下限ｌ_ｂとの差であるｂ_ｗを求める。そして、このｂ_ｗの間をスキャンして、直線補間値ｃ〔ii〕を求める。
【００７２】
ステップＳ２８では、図６に示す求めようとするｃ〔ii〕と下限ｌ_ｂとの差ｉ_ｄｘを０に設定する。すなわち、ｉ_ｄｘ＝０の位置（下限ｌ_ｂと一致）からスキャンを開始するスキャン開始位置を設定する。
【００７３】
ステップＳ２９では、上述したように下限ｌ_ｂからスキャンｊを開始する。
【００７４】
ステップＳ３０では、スキャンｊが求めようとするｃ〔ii〕の位置と一致したか否かを判別する。ここで、ＹＥＳを判別するとステップＳ３１に進み、ＮＯを判別するとステップＳ３２に進む。
【００７５】
ステップＳ３１では、位置関係に関連する重み付けを考慮したｃ〔ii〕を求める。ここで、例えば、ｉ_ｄｘが０のときは、ｃ〔ii〕＝ｂ〔ｉ〕となり、ｉ_ｄｘがｂ_ｗのときは、ｃ〔ii〕＝ｂ〔ｉ＋１〕となる。
ステップＳ３２では、ｉ_ｄｘをインクリメントする。そして、ステップＳ３３では、出力ハーモニクスのインデックスiiが出力ハーモニクスの数Ｍ' より大きくなったか否かを判別する。ここで、ＹＥＳを判別すると、このフローは終了となり、ＮＯを判別するとステップＳ３４に進む。
【００７６】
ステップＳ３４では、スキャンｊの繰り返しを始める。
【００７７】
ステップＳ３５では、スキャンｊが上限ｕ_ｂまで到達したか否かを判別する。ここで、ＹＥＳを判別するとステップＳ３６に進み、ＮＯを判別するとステップＳ３０に戻る。
【００７８】
ステップＳ３６では、入力側のハーモニクスｉをインクリメントする。
【００７９】
ステップＳ３７では、ｉが入力ハーモニクスＭとＯ_Ｓとの積よりも大きくなったか否かを判別する。ここで、ＹＥＳを判別するとこのフローは終了となるが、ＮＯを判別するとステップＳ２３に戻る。
【００８０】
以上のフローチャートより、本実施例は、上記図７のフローチャートの処理で求めたｂ〔ｍ〕を直線補間部１６により直線補間するだけで、必要な点だけを求められる。
【００８１】
このように第１の実施例は、必要な点のみを求めることによって、個数が可変とされたデータを一定個数にすることができる。そのため、演算量が減少する。
【００８２】
このようにして、一定サンプル数のデータに変換した数列に必要に応じてブロック間、あるいはフレーム間で差分をとり、ベクトル量子化を施して、そのインデックスを伝送するようにすればよい。
【００８３】
上述した第１の実施例は、ＭＢＥボコーダにより算出されたスペクトルエンベロープの個数が可変とされた振幅データを一定個数に変換する方法であったが、以下、第２の実施例として、一定個数にされたデータをデータ内容に応じた個数のデータに変換するデータ個数変換方法を説明する。この第２の実施例は例えば音声信号を合成するデコーダ側に適用される。すなわち、デコーダ側では、上記インデックスより、ベクトル量子化及び逆量子化された数列の一定個数とされた波形データを得て、そのデータ列を、同様の方法で、すなわち帯域制限オーバーサンプリング、直線補間等を施すことにより、データの内容に応じた個数のＭ個の数列に変換する。
【００８４】
図９は第２の実施例の概略構成を示している。
【００８５】
上記第１の実施例において、一定個数とされた入力データは入力端子２１を介してデータ個数変換本体部２２に供給され、このデータ個数変換本体部２２で可変個数のデータとされて出力端子２６から出力される。このデータ個数変換本体部２２は、スペクトルエンベロープ拡張部２３、帯域制限型ＦＩＲフィルタ２４及び直線補間部２５から成っている。
【００８６】
入力されたブロック毎に一定個数の入力データは、スペクトルエンベロープ拡張部２３でスペクトルエンベロープの両端の値を延長される。この両端が前後に延長されたスペクトルエンベロープは、ＦＩＲフィルタ２４に供給される。このＦＩＲフィルタ２４はスペクトルエンベロープが延長されることによりデータ個数が拡大されたデータのサンプル点に対しそれぞれ異なる複数の位相と対応した複数の係数セットの内の一定個数のデータの各位置の近傍の位置に対応する係数セットを用いることにより、中間的な出力データを求める。そして、この中間的な出力データは直線補間部２５に供給される。この直線補間部２５は上記中間的な出力データを直線補間し、出力端子２６から間引きされ、データ内容に応じた可変個数のデータを出力する。
【００８７】
この第２の実施例は、必要な点のみを求めることによって、個数が一定とされたデータをデータ内容に応じた個数に変換することができる。そのため、演算量が減少される。
【００８８】
ここで、第１の実施例による乗算の回数は、求めるデータの個数を４４個とすれば、その２倍の８８個のデータに対し、８回の乗算が施されることになり、１０２４回の乗算となる。これは、上述した高速化手法を用いたＦＦＴ、ＩＦＦＴの乗算の回数の合計３１４８８回の１／４５となる。また、第２の実施例による乗算の回数は、求めるデータの個数を６０個とすれば、その２倍の１２０個のデータに対し、８回の乗算がほどこされることになる。これは、上述した高速化手法を用いたＦＦＴ、ＩＦＦＴの乗算の回数の合計３１４８８回の１／３０となる。
【００８９】
次に、上述したようなデータ数変換方法が適用可能な、音声信号の合成分析符号化装置（いわゆるボコーダ）の一種のＭＢＥ（Multiband Excitation: マルチバンド励起）ボコーダの具体例について、図面を参照しながら説明する。
【００９０】
以下に説明するＭＢＥボコーダは、D.W. Griffin and J.S. Lim, “Multiband Excitation Vocoder," IEEE Trans.Acoustics,Speech,and Signal Processing, vol.36, No.8, pp.1223-1235, Aug. 1988 に開示されているものであり、従来のＰＡＲＣＯＲ（PARtial auto-CORrelation: 偏自己相関）ボコーダ等では、音声のモデル化の際に有声音区間と無声音区間とをブロックあるいはフレーム毎に切り換えていたのに対し、ＭＢＥボコーダでは、同時刻（同じブロックあるいはフレーム内）の周波数軸領域に有声音（Voiced）区間と無声音（Unvoiced）区間とが存在するという仮定でモデル化している。
【００９１】
図１０は、上記ＭＢＥボコーダに本発明を適用した実施例の全体の概略構成を示すブロック図である。
【００９２】
この図１０において、入力端子１０１には音声信号が供給されるようになっており、この入力音声信号は、ＨＰＦ（ハイパスフィルタ）等のフィルタ１０２に送られて、いわゆるＤＣ（直流）オフセット分の除去や帯域制限（例えば２００〜３４００Hzに制限）のための少なくとも低域成分（２００Hz以下）の除去が行われる。このフィルタ１０２を介して得られた信号は、ピッチ抽出部１０３及び窓かけ処理部１０４にそれぞれ送られる。ピッチ抽出部１０３では、入力音声信号データが所定サンプル数Ｎ（例えばＮ＝２５６）単位でブロック分割され（あるいは方形窓による切り出しが行われ）、このブロック内の音声信号についてのピッチ抽出が行われる。このような切り出しブロック（２５６サンプル）を、例えば図１１のＡに示すようにＬサンプル（例えばＬ＝１６０）のフレーム間隔で時間軸方向に移動させており、各ブロック間のオーバラップはＮ−Ｌサンプル（例えば９６サンプル）となっている。また、窓かけ処理部１０４では、１ブロックＮサンプルに対して所定の窓関数、例えばハミング窓をかけ、この窓かけブロックを１フレームＬサンプルの間隔で時間軸方向に順次移動させている。
【００９３】
このような窓かけ処理を数式で表すと、
ｘ_ｗ (k,q) ＝ｘ(q) ｗ(kL-q) ・・・（７）
となる。この（７）式において、ｋはブロック番号を、ｑはデータの時間インデックス（サンプル番号）を表し、処理前の入力信号のｑ番目のデータｘ(q) に対して第ｋブロックの窓（ウィンドウ）関数ｗ(kL-q)により窓かけ処理されることによりデータｘ_ｗ (k,q) が得られることを示している。ピッチ抽出部１０３内での図１１のＡに示すような方形窓の場合の窓関数ｗ_ｒ (r) は、

また、窓かけ処理部１０４での図１１のＢに示すようなハミング窓の場合の窓関数ｗ_ｈ (r) は、

である。このような窓関数ｗ_ｒ (r) あるいはｗ_ｈ (r) を用いるときの上記（７）式の窓関数ｗ(r) （＝ｗ(kL-q)）の否零区間は、
０≦ｋＬ−ｑ＜Ｎ
これを変形して、
ｋＬ−Ｎ＜ｑ≦ｋＬ
従って例えば上記方形窓の場合に窓関数ｗ_ｒ (kL-q)＝１となるのは、図１２に示すように、ｋＬ−Ｎ＜ｑ≦ｋＬのときとなる。また、上記（７）〜（９）式は、長さＮ（＝２５６）サンプルの窓が、Ｌ（＝１６０）サンプルずつ前進してゆくことを示している。以下、上記（８）式、（９）式の各窓関数で切り出された各Ｎ点（０≦ｒ＜Ｎ）の否零サンプル列を、それぞれｘ_ｗｒ(k,r) 、ｘ_ｗｈ(k,r) と表すことにする。
【００９４】
窓かけ処理部１０４では、図１３に示すように、上記（９）式のハミング窓がかけられた１ブロック２５６サンプルのサンプル列ｘ_ｗｈ(k,r) に対して１７９２サンプル分の０データが付加されて（いわゆる０詰めされて）２０４８サンプルとされ、この２０４８サンプルの時間軸データ列に対して、直交変換部１０５により例えばＦＦＴ（高速フーリエ変換）等の直交変換処理が施される。あるいは、２５６点のままで（０詰めなしで）ＦＦＴを施してもよい。
【００９５】
ピッチ抽出部１０３では、上記ｘ_ｗｒ(k,r) のサンプル列（１ブロックＮサンプル）に基づいてピッチ抽出が行われる。このピッチ抽出法には、時間波形の周期性や、スペクトルの周期的周波数構造や、自己相関関数を用いるもの等が知られているが、本実施例では、センタクリップ波形の自己相関法を採用している。このときのブロック内でのセンタクリップレベルについては、１ブロックにつき１つのクリップレベルを設定してもよいが、ブロックを細分割した各部（各サブブロック）の信号のピークレベル等を検出し、これらの各サブブロックのピークレベル等の差が大きいときに、ブロック内でクリップレベルを段階的にあるいは連続的に変化させるようにしている。このセンタクリップ波形の自己相関データのピーク位置に基づいてピッチ周期を決めている。このとき、現在フレームに属する自己相関データ（自己相関は１ブロックＮサンプルのデータを対象として求められる）から複数のピークを求めておき、これらの複数のピークの内の最大ピークが所定の閾値以上のときには該最大ピーク位置をピッチ周期とし、それ以外のときには、現在フレーム以外のフレーム、例えば前後のフレームで求められたピッチに対して所定の関係を満たすピッチ範囲内、例えば前フレームのピッチを中心として±２０％の範囲内にあるピークを求め、このピーク位置に基づいて現在フレームのピッチを決定するようにしている。このピッチ抽出部１０３ではオープンループによる比較的ラフなピッチのサーチが行われ、抽出されたピッチデータは高精度（ファイン）ピッチサーチ部１０６に送られて、クローズドループによる高精度のピッチサーチ（ピッチのファインサーチ）が行われる。
【００９６】
高精度（ファイン）ピッチサーチ部１０６には、ピッチ抽出部１０３で抽出された整数（インテジャー）値の粗（ラフ）ピッチデータと、直交変換部１０５により例えばＦＦＴされた周波数軸上のデータとが供給されている。この高精度ピッチサーチ部１０６では、上記粗ピッチデータ値を中心に、0.２〜0.５きざみで±数サンプルずつ振って、最適な小数点付き（フローティング）のファインピッチデータの値へ追い込む。このときのファインサーチの手法として、いわゆる合成による分析 (Analysis by Synthesis)法を用い、合成されたパワースペクトルが原音のパワースペクトルに最も近くなるようにピッチを選んでいる。
【００９７】
このピッチのファインサーチについて説明する。先ず、上記ＭＢＥボコーダにおいては、上記ＦＦＴ等により直交変換された周波数軸上のスペクトルデータとしてのＳ(j) を
Ｓ(j) ＝Ｈ(j) ｜Ｅ(j)｜０＜ｊ＜Ｊ・・・（10）
と表現するようなモデルを想定している。ここで、Ｊはω_ｓ／４π＝ｆ_ｓ／２に対応し、サンプリング周波数ｆ_ｓ＝ω_ｓ／２πが例えば８ｋHzのときには４ｋHzに対応する。上記（10）式中において、周波数軸上のスペクトルデータＳ(j) が図１４のＡに示すような波形のとき、Ｈ(j) は、図１４のＢに示すような元のスペクトルデータＳ(j) のスペクトル包絡線（エンベロープ）を示し、Ｅ(j) は、図１４のＣに示すような等レベルで周期的な励起信号（エキサイテイション）のスペクトルを示している。すなわち、ＦＦＴスペクトルＳ(j) は、スペクトルエンベロープＨ(j) と励起信号のパワースペクトル｜Ｅ(j)｜との積としてモデル化される。
【００９８】
上記励起信号のパワースペクトル｜Ｅ(j)｜は、上記ピッチに応じて決定される周波数軸上の波形の周期性（ピッチ構造）を考慮して、１つの帯域（バンド）の波形に相当するスペクトル波形を周波数軸上の各バンド毎に繰り返すように配列することにより形成される。この１バンド分の波形は、例えば上記図１３に示すような２５６サンプルのハミング窓関数に１７９２サンプル分の０データを付加（０詰め）した波形を時間軸信号と見なしてＦＦＴし、得られた周波数軸上のある帯域幅を持つインパルス波形を上記ピッチに応じて切り出すことにより形成することができる。
【００９９】
次に、上記ピッチに応じて分割された各バンド毎に、上記Ｈ(j) を代表させるような（各バンド毎のエラーを最小化するような）値（一種の振幅）｜Ａ_ｍ｜を求める。ここで、例えば第ｍバンド（第ｍ高調波の帯域）の下限、上限の点をそれぞれａ_ｍ、ｂ_ｍとするとき、この第ｍバンドのエラーε_ｍは、
【０１００】
【数６】

【０１０１】
で表せる。このエラーε_ｍを最小化するような｜Ａ_ｍ｜は、
【０１０２】
【数７】

【０１０３】
となり、この（12）式の｜Ａ_ｍ｜のとき、エラーε_ｍを最小化する。このような振幅｜Ａ_ｍ｜を各バンド毎に求め、得られた各振幅｜Ａ_ｍ｜を用いて上記（11）式で定義された各バンド毎のエラーε_ｍを求める。次に、このような各バンド毎のエラーε_ｍの全バンドの総和値Σε_ｍを求める。さらに、このような全バンドのエラー総和値Σε_ｍを、いくつかの微小に異なるピッチについて求め、エラー総和値Σε_ｍが最小となるようなピッチを求める。
【０１０４】
すなわち、上記ピッチ抽出部１０３で求められたラフピッチを中心として、例えば 0.25 きざみで上下に数種類ずつ用意する。これらの複数種類の微小に異なるピッチの各ピッチに対してそれぞれ上記エラー総和値Σε_ｍを求める。この場合、ピッチが定まるとバンド幅が決まり、上記（13）式より、周波数軸上データのパワースペクトル｜Ｓ(j) ｜と励起信号スペクトル｜Ｅ(j) ｜とを用いて上記（11）式のエラーε_ｍを求め、その全バンドの総和値Σε_ｍを求めることができる。このエラー総和値Σε_ｍを各ピッチ毎に求め、最小となるエラー総和値に対応するピッチを最適のピッチとして決定するわけである。以上のようにして高精度ピッチサーチ部１０６で最適のファイン（例えば 0.25 きざみ）ピッチが求められ、この最適ピッチに対応する振幅｜Ａ_ｍ｜が決定される。
【０１０５】
以上ピッチのファインサーチの説明においては、説明を簡略化するために、全バンドが有声音（Voiced）の場合を想定しているが、上述したようにＭＢＥボコーダにおいては、同時刻の周波数軸上に無声音（Unvoiced）領域が存在するというモデルを採用していることから、上記各バンド毎に有声音／無声音の判別を行うことが必要とされる。
【０１０６】
上記高精度ピッチサーチ部１０６からの最適ピッチ及び振幅｜Ａ_ｍ｜のデータは、有声音／無声音判別部１０７に送られ、上記各バンド毎に有声音／無声音の判別が行われる。この判別のために、ＮＳＲ（ノイズｔｏシグナル比）を利用する。すなわち、第ｍバンドのＮＳＲは、
【０１０７】
【数８】

【０１０８】
と表せ、このＮＳＲ値が所定の閾値（例えば0.３）より大のとき（エラーが大きい）ときには、そのバンドでの｜Ａ_ｍ｜｜Ｅ(j) ｜による｜Ｓ(j) ｜の近似が良くない（上記励起信号｜Ｅ(j) ｜が基底として不適当である）と判断でき、当該バンドをＵＶ（Unvoiced、無声音）と判別する。これ以外のときは、近似がある程度良好に行われていると判断でき、そのバンドをＶ（Voiced、有声音）と判別する。
【０１０９】
次に、振幅再評価部１０８には、直交変換部１０５からの周波数軸上データ、高精度ピッチサーチ部１０６からのファインピッチと評価された振幅｜Ａ_ｍ｜との各データ、及び上記有声音／無声音判別部１０７からのＶ／ＵＶ（有声音／無声音）判別データが供給されている。この振幅再評価部１０８では、有声音／無声音判別部１０７において無声音（ＵＶ）と判別されたバンドに関して、再度振幅を求めている。このＵＶのバンドについての振幅｜Ａ_ｍ｜_ＵＶは、
【０１１０】
【数９】

【０１１１】
にて求められる。
【０１１２】
この振幅再評価部１０８からのデータは、データ数変換（一種のサンプリングレート変換）部１０９に送られる。このデータ数変換部１０９は、上記ピッチに応じて周波数軸上での分割帯域数が異なり、データ数（特に振幅データの数）が異なることを考慮して、一定の個数にするためのものである。すなわち、例えば有効帯域を３４００ｋHzまでとすると、この有効帯域が上記ピッチに応じて、８バンド〜６３バンドに分割されることになり、これらの各バンド毎に得られる上記振幅｜Ａ_ｍ｜（ＵＶバンドの振幅｜Ａ_ｍ｜_ＵＶも含む）データの個数ｍ_Ｍ _Ｘ＋１も８〜６３と変化することになる。このためデータ数変換部１０９では、この可変個数ｍ_ＭＸ＋１の振幅データを一定個数（例えば４４個）のデータに変換している。
【０１１３】
ここで本第１の実施例においては、上記図１〜図８と共に説明したように、周波数軸上の有効帯域１ブロック分の振幅データに対して、ブロック内の両端のデータを延長してデータ個数を拡大し、帯域制限型ＦＩＲフィルタによるフィルタ処理を施し、さらに直線補間を施すことにより一定個数（例えば４４個）のデータを得ている。
【０１１４】
このデータ数変換部１０９からのデータ（上記一定個数の振幅データ）がベクトル量子化部１１０に送られて、所定個数のデータ毎にまとめられてベクトルとされ、ベクトル量子化が施される。ベクトル量子化部１１０からの量子化出力データは、ＣＲＣ＆レート１／２畳込み符号付加部１１１に供給されと共にフレームインターリーブ部１１２に供給される。また、上記高精度のピッチサーチ部１０６からの高精度（ファイン）ピッチデータ及び上記有声音／無声音判別部１０７からの有声音／無声音（Ｖ／ＵＶ）判別データも上記ＣＲＣ＆レート１／２畳込み符号付加部１１１に供給される。
【０１１５】
ここで、上記ＣＲＣ＆レート１／２畳込み符号付加部１１１は、上記ファインピッチデータ、Ｖ／ＵＶ判別データ及び量子化出力データを用いて、スペクトルエンベロープの量子化を階層的な構造とし、その出力インデックスの重要度を分けることで効果的に畳込み符号による誤り訂正を行う。
【０１１６】
これは、本件出願人が特願平４−９１４２２号において、提案した高能率符号化方法、すなわち、Ｍ次元ベクトルを、Ｓ次元（Ｓ＜Ｍ）ベクトルに次元低下させてベクトル量子化するような、階層構造化されたコードブックを有する量子化を行わせる方法と同様に誤り訂正符号の効果的な適用が可能となる方法である。
【０１１７】
具体的に、このデコーダ側のビタビ符号＆ＣＲＣ検出は、以下のような原理である。図１５は、ビタビ復号＆ＣＲＣ検出の原理を説明するための機能ブロック図である。例えば、音声符号器１２１から出力された音声パラメータのうち、聴覚上特に重要な部分（クラス１）８０ビットとそれ以外の部分（クラス２）４０ビットとに分ける。クラス１のうちさらに重要な５０ビットについてＣＲＣ計算ブロック１２２によりＣＲＣを計算し、７ビットの結果を得る。クラス１の８０ビットとＣＲＣの７ビットと畳込み符号化器の初期値を０に戻すためのテールビット５ビットの合計９２ビットを畳込み符号化部１２３に入力し、１８４ビットの出力を得る。畳込み符号化された１８４ビットとクラス２ビットの４０ビットの計２２４ビットにつき、２スロットインターリーブ器１２４により、インターリーブを行い、その出力として２２４ビットを伝送する。
【０１１８】
この２スロットインターリーブ器１２４に相当するのが図１０のフレームインターリーブ部１１２であり、その出力が出力端子１１３から伝送される。
【０１１９】
なお、これらの各データは、上記Ｎサンプル（例えば２５６サンプル）のブロック内のデータに対して処理を施すことにより得られるものであるが、ブロックは時間軸上を上記Ｌサンプルのフレームを単位として前進することから、伝送するデータは上記フレーム単位で得られる。すなわち、上記フレーム周期でピッチデータ、Ｖ／ＵＶ判別データ、振幅データが更新されることになる。
【０１２０】
次に、本発明に係る復号装置の実施例として、伝送されて得られた上記出力データに基づき音声信号を合成するための合成側（デコード側）の概略構成について、図１６を参照しながら説明する。
【０１２１】
この図１５において、入力端子１３１には、伝送されたきたＣＲＣ＆レート１／２畳込み符号が付加された出力データが供給される。入力端子１３１からの出力データは、フレームデインタリーブ１３２に供給され、デインターリーブされる。デインターリーブされたデータは、ビタビ復号＆ＣＲＣ検出部１３３に供給され、復号化される。
【０１２２】
そして、マスク処理部１３４が、フレームデインターリーブ１３２からのデータをマスク処理し、量子化振幅データを逆ベクトル量子化部１３５に供給する。
【０１２３】
この逆量子化部１３５も階層構造化されており、各階層のインデックスデータに基づいて逆ベクトル化されたデータを合成して出力する。この逆量子化部１３５からの出力データは、データ数逆変換部１３６に送られて逆変換される。このデータ数逆変換部１３６では、上述した図９の説明と同様な（逆）変換が行われ、得られた振幅データが有声音合成部１３７及び無声音合成部１３８に送られる。また、上記マスク処理部１３４は、符号化ピッチデータをピッチ復号化部１３９に供給する。このピッチ復号化器１３９で復号されたピッチデータは、データ数逆変換部１３６、有声音合成部１３７及び無声音合成部１３８に送られる。また、上記マスク処理部１３４は、Ｖ／ＵＶ判別データを有声音合成部１３７及び無声音合成部１３８に供給する。
【０１２４】
有声音合成部１３７では例えば余弦(cosine)波合成により時間軸上の有声音波形を合成し、無声音合成部１３８では例えばホワイトノイズをバンドパスフィルタでフィルタリングして時間軸上の無声音波形を合成し、これらの各有声音合成波形と無声音合成波形とを加算部１４０で加算合成して、出力端子１４１より取り出すようにしている。この場合、上記振幅データ、ピッチデータ及びＶ／ＵＶ判別データは、上記分析時の１フレーム（Ｌサンプル、例えば１６０サンプル）毎に更新されて与えられるが、フレーム間の連続性を高める（円滑化する）ために、上記振幅データやピッチデータの各値を１フレーム中の例えば中心位置における各データ値とし、次のフレームの中心位置までの間（合成時の１フレーム）の各データ値を補間により求める。すなわち、合成時の１フレーム（例えば上記分析フレームの中心から次の分析フレームの中心まで）において、先端サンプル点での各データ値と終端（次の合成フレームの先端）サンプル点での各データ値とが与えられ、これらのサンプル点間の各データ値を補間により求めるようにしている。
【０１２５】
以下、有声音合成部１３７における合成処理を詳細に説明する。
【０１２６】
上記Ｖ（有声音）と判別された第ｍバンド（第ｍ高調波の帯域）における時間軸上の上記１合成フレーム（Ｌサンプル、例えば１６０サンプル）分の有声音をＶ_ｍ (n) とするとき、この合成フレーム内の時間インデックス（サンプル番号）ｎを用いて、
Ｖ_ｍ (n) ＝Ａ_ｍ (n) cos(θ_ｍ (n)) ０≦ｎ＜Ｌ・・・（15）
と表すことができる。全バンドの内のＶ（有声音）と判別された全てのバンドの有声音を加算（ΣＶ_ｍ (n) ）して最終的な有声音Ｖ(n) を合成する。
【０１２７】
この（15）式中のＡ_ｍ (n) は、上記合成フレームの先端から終端までの間で補間された第ｍ高調波の振幅である。最も簡単には、フレーム単位で更新される振幅データの第ｍ高調波の値を直線補間すればよい。すなわち、上記合成フレームの先端（ｎ＝０）での第ｍ高調波の振幅値をＡ_０ｍ、該合成フレームの終端（ｎ＝Ｌ：次の合成フレームの先端）での第ｍ高調波の振幅値をＡ_Ｌｍとするとき、
Ａ_ｍ (n) ＝ (L-n)Ａ_０ｍ／Ｌ＋ｎＡ_Ｌｍ／Ｌ・・・（16）
の式によりＡ_ｍ (n) を計算すればよい。
【０１２８】
次に、上記（15）式中の位相θ_ｍ (n) は、

により求めることができる。この（17）式中で、φ_０ｍは上記合成フレームの先端（ｎ＝０）での第ｍ高調波の位相（フレーム初期位相）を示し、ω_０１は合成フレーム先端（ｎ＝０）での基本角周波数、ω_Ｌ１は該合成フレームの終端（ｎ＝Ｌ：次の合成フレーム先端）での基本角周波数をそれぞれ示している。上記（17）式中のΔωは、ｎ＝Ｌにおける位相φ_Ｌｍがθ_ｍ (L) に等しくなるような最小のΔωを設定する。
【０１２９】
以下、任意の第ｍバンドにおいて、それぞれｎ＝０、ｎ＝ＬのときのＶ／ＵＶ判別結果に応じた上記振幅Ａ_ｍ (n) 、位相θ_ｍ (n) の求め方を説明する。
【０１３０】
第ｍバンドが、ｎ＝０、ｎ＝ＬのいずれもＶ（有声音）とされる場合に、振幅Ａ_ｍ (n) は、上述した（16）式により、伝送された振幅値Ａ_０ｍ、Ａ_Ｌｍを直線補間して振幅Ａ_ｍ (n) を算出すればよい。位相θ_ｍ (n) は、ｎ＝０でθ_ｍ (0) ＝φ_０ｍからｎ＝Ｌでθ_ｍ (L) がφ_ＬｍとなるようにΔωを設定する。
【０１３１】
次に、ｎ＝０のときＶ（有声音）で、ｎ＝ＬのときＵＶ（無声音）とされる場合に、振幅Ａ_ｍ (n) は、Ａ_ｍ (0) の伝送振幅値Ａ_０ｍからＡ_ｍ (L) で０となるように直線補間する。ｎ＝Ｌでの伝送振幅値Ａ_Ｌｍは無声音の振幅値であり、後述する無声音合成の際に用いられる。位相θ_ｍ (n) は、θ_ｍ (0) ＝φ_０ｍとし、かつΔω＝０とする。
【０１３２】
さらに、ｎ＝０のときＵＶ（無声音）で、ｎ＝ＬのときＶ（有声音）とされる場合には、振幅Ａ_ｍ (n) は、ｎ＝０での振幅Ａ_ｍ (0) を０とし、ｎ＝Ｌで伝送された振幅値Ａ_Ｌｍとなるように直線補間する。位相θ_ｍ (n) については、ｎ＝０での位相θ_ｍ (0) として、フレーム終端での位相値φ_Ｌｍを用いて、
θ_ｍ (0) ＝φ_Ｌｍ−ｍ（ω_Ｏ１＋ω_Ｌ１）Ｌ／２・・・（18）
とし、かつΔω＝０とする。
【０１３３】
上記ｎ＝０、ｎ＝ＬのいずれもＶ（有声音）とされる場合に、θ_ｍ (L) がφ_ＬｍとなるようにΔωを設定する手法について説明する。上記（17）式で、ｎ＝Ｌと置くことにより、

となり、これを整理すると、Δωは、
Δω＝（mod2π((φ_Ｌｍ−φ_０ｍ) − mL(ω_Ｏ１＋ω_Ｌ１)/2)／Ｌ・・・（19）
となる。この（19）式でmod2π(x) とは、ｘの主値を−π〜＋πの間の値で返す関数である。例えば、ｘ＝１.3πのときmod2π(x) ＝−０.7π、ｘ＝２.3πのときmod2π(x) ＝０.3π、ｘ＝−１.3πのときmod2π(x) ＝０.7π、等である。
【０１３４】
ここで、図１７のＡは、音声信号のスペクトルの一例を示しており、バンド番号（ハーモニクスナンバ）ｍが８、９、１０の各バンドがＵＶ（無声音）とされ、他のバンドはＶ（有声音）とされている。このＶ（有声音）のバンドの時間軸信号が上記有声音合成部１３７により合成され、ＵＶ（無声音）のバンドの時間軸信号が無声音合成部１３８で合成されるわけである。
【０１３５】
以下、無声音合成部１３８における無声音合成処理を説明する。
【０１３６】
ホワイトノイズ発生部１４２からの時間軸上のホワイトノイズ信号波形を、所定の長さ（例えば２５６サンプル）で適当な窓関数（例えばハミング窓）により窓かけをし、ＳＴＦＴ処理部１４３によりＳＴＦＴ（ショートタームフーリエ変換）処理を施すことにより、図１７のＢに示すようなホワイトノイズの周波数軸上のパワースペクトルを得る。このＳＴＦＴ処理部１４３からのパワースペクトルをバンド振幅処理部１４４に送り、図１７のＣに示すように、上記ＵＶ（無声音）とされたバンド（例えばｍ＝８、９、１０）について上記振幅｜Ａ_ｍ｜_ＵＶを乗算し、他のＶ（有声音）とされたバンドの振幅を０にする。このバンド振幅処理部１４４には上記振幅データ、ピッチデータ、Ｖ／ＵＶ判別データが供給されている。バンド振幅処理部１４４からの出力は、ＩＳＴＦＴ処理部１４５に送られ、位相は元のホワイトノイズの位相を用いて逆ＳＴＦＴ処理を施すことにより時間軸上の信号に変換する。ＩＳＴＦＴ処理部１４５からの出力は、オーバーラップ加算部１４６に送られ、時間軸上で適当な（元の連続的なノイズ波形を復元できるように）重み付けをしながらオーバーラップ及び加算を繰り返し、連続的な時間軸波形を合成する。オーバーラップ加算部１４６からの出力信号が上記加算部１４０に送られる。
【０１３７】
このように、各合成部１３７、１３８において合成されて時間軸上に戻された有声音部及び無声音部の各信号は、加算部１４０により適当な固定の混合比で加算して、出力端子１４１より再生された音声信号を取り出す。
【０１３８】
ここで、上述したデコーダ側のビタビ復号＆ＣＲＣ検出は、以下のような原理である。図１８は、ビタビ復号＆ＣＲＣ検出の原理を説明するための機能ブロック図である。例えば、図１８に示すような原理である。先ず、伝送されてきた２２４ビットを２スロットデインターリーブ器１５１が受信し、デインタリーブする。この２スロットデインターリーブ器１５１の出力をクラス２とエンコードされているクラス１ビットに分け、後者を畳込み復号化器１５２に入力し、復号して、８０ビットのクラス１復号結果を受信７ビットを得る。次に、８０ビットのクラス１復号結果からエンコーダで計算したのと同じパラメータビットに相当するものから再びＣＲＣをＣＲＣ計算部１５３により計算し、受信ＣＲＣと比較し、その結果を音声復号器１５４に出力する。
【０１３９】
なお、上記図１０の音声分析側（エンコード側）の構成や図１６の音声合成側（デコード側）の構成については、各部をハードウェア的に記載しているが、いわゆるＤＳＰ（ディジタル信号プロセッサ）等を用いてソフトウェアプログラムにより実現することも可能である。
なお、本発明は上記実施例のみに限定されるものではなく、例えば、音声信号のみならず、音響信号を入力信号として用いることもできる。
【０１４０】
【発明の効果】
以上の説明から明らかなように、本発明に係る符号化装置によれば、入力オーディオ信号をブロックに分割して、ブロック内の可変個数の波形データ又は波形を表すパラメータデータを抽出し、上記抽出された可変個数のデータをブロック毎に一定の個数の基準データと比較するために上記可変個数のデータを上記一定個数に変換して符号化する符号化装置であって、上記可変個数のデータが入力される帯域制限型オーバーサンプリングのためのＦＩＲフィルタで、上記入力データのサンプル点に対してそれぞれ異なる複数の位相と対応した複数の係数セットの内の上記一定個数のデータの各位置に対応する係数セットを用いることにより、出力として必要な上記一定個数のデータを求める手段を有しているため、必要な点のみを計算する間引かれた演算が可能となり、積和の演算回数を大幅に減らせる。
【０１４１】
また、他の発明に係る符号化装置によれば、入力オーディオ信号をブロックに分割して、ブロック内の可変個数の波形データ又は波形を表すパラメータデータを抽出し、上記抽出された可変個数のデータをブロック毎に一定の個数の基準データと比較するために上記可変個数のデータを上記一定個数のデータに変換して符号化する符号化装置であって、上記可変個数のデータが入力される帯域制限型オーバーサンプリングのためのＦＩＲフィルタで、上記入力データのサンプル点に対してそれぞれ異なる複数の位相と対応した複数の係数セットの内の上記一定個数のデータの各位置の近傍の位置に対応する係数セットを用いることにより、中間的な出力データを求める手段と、上記中間的な出力データを補間して必要とされる一定個数のデータを求める手段とを有しているため、必要な点のみを計算する間引かれた演算が可能となり、積和の演算回数を大幅に減らせる。
【０１４２】
また、本発明に係る復号装置によれば、入力オーディオ信号をブロックに分割して、ブロック内の可変個数の波形データ又は波形を表すパラメータデータを抽出し、上記抽出された可変個数のデータをブロック毎に一定の個数の基準データと比較するために上記可変個数のデータを上記一定個数のデータに変換することにより符号化された符号列を受け取り、上記符号列から上記一定個数のデータを復号化し、上記復号化された一定個数のデータから可変個数のデータに逆変換する復号装置であって、上記一定個数のデータが入力される帯域制限型オーバーサンプリングのためのＦＩＲフィルタで、上記入力データのサンプル点に対してそれぞれ異なる複数の位相と対応した複数の係数セットの内の上記可変個数のデータの各位置に対応する係数セットを用いることにより、出力として必要な上記可変個数のデータを求める手段を有しているため、必要な点のみを計算する間引かれた演算が可能となり、積和の演算回数を大幅に減らせる。
【０１４３】
また、他の発明に係る復号装置によれば、入力オーディオ信号をブロックに分割して、ブロック内の可変個数の波形データ又は波形を表すパラメータデータを抽出し、上記抽出された可変個数のデータをブロック毎に一定の個数の基準データと比較するために上記可変個数のデータを上記一定個数のデータに変換することにより符号化された符号列を受け取り、上記符号列から上記一定個数のデータを復号化し、上記復号化された一定個数のデータから可変個数のデータに逆変換する復号装置であって、上記一定個数のデータが入力される帯域制限型オーバーサンプリングのためのＦＩＲフィルタで、上記入力データのサンプル点に対してそれぞれ異なる複数の位相と対応した複数の係数セットの内の上記可変個数のデータの各位置の近傍の位置に対応する係数セットを用いることにより、中間的な出力データを求める手段と、上記中間的な出力データを補間して必要とされる可変個数のデータを求める手段とを有しているため、必要な点のみを計算する間引かれた演算が可能となり、積和の演算回数を大幅に減らせる。
【図面の簡単な説明】
【図１】本発明に係る符号化装置の第１の実施例に用いられるデータ数変換方法を説明するための概略構成を示すブロック図である。
【図２】データ数変化の一例を説明するための波形図である。
【図３】スペクトルエンベロープの拡張を説明するための波形図である。
【図４】ＦＩＲフィルタのフィルタ係数を説明するための図である。
【図５】図４に示されたフィルタ係数を用い実際に出力点を求める例を説明するための図である。
【図６】直線補間で使う値の求め方及び直線補間を説明するための図である。
【図７】直線補間で使う値の求め方を説明するためのフローチャートである。
【図８】直線補間を説明するためのフローチャートである。
【図９】第２の実施例を説明するための図である。
【図１０】本発明に係る符号化装置の実施例の具体例としての音声信号の合成分析符号化装置の分析側（エンコード側）の概略構成を示す機能ブロック図である。
【図１１】窓かけ処理を説明するための図である。
【図１２】窓かけ処理と窓関数との関係を説明するための図である。
【図１３】直交変換（ＦＦＴ）処理対象としての時間軸データを示す図である。
【図１４】周波数軸上のスペクトルデータ、スペクトル包絡線（エンベロープ）及び励起信号のパワースペクトルを示す図である。
【図１５】ＣＲＣ＆畳込み符号を説明するための図である。
【図１６】本発明に係る復号装置の実施例として、データ数変換方法が適用される装置の具体例としての音声信号の合成分析符号化装置の合成側（デコード側）の概略構成を示す機能ブロック図である。
【図１７】音声信号を合成する際の無声音合成を説明するための図である。
【図１８】ＣＲＣ＆畳込み復号を説明するための図である。
【符号の説明】
１２非線形圧縮部、１３データ個数変換本体部、１４スペクトルエンベロープ拡張部、１５帯域制限型ＦＩＲフィルタ、１６直線補間部、１０３ピッチ抽出部、１０４窓かけ処理部、１０５直交変換（ＦＦＴ）部、１０６高精度（ファイン）ピッチサーチ部、１０７有声音／無声音（Ｖ／ＵＶ）判別部、１０８振幅再評価部、１０９データ数変換（データレートコンバート）部、１１０ベクトル量子化部、１１１ＣＲＣ＆畳込み符号化部、１１２フレームインターリーブ部[0001]
[Industrial application fields]
The present invention relates to an encoding device and a decoding device, and in particular, data number conversion for converting a variable number of data such as spectrum amplitude data calculated by a speech synthesis analysis device (vocoder) or the like into a fixed number of data. The present invention relates to an encoding device and a decoding device that include
[0002]
[Prior art]
Various encoding methods are known in which signal compression is performed using statistical properties of audio signals (including audio signals and acoustic signals) in the time domain and frequency domain, and characteristics of human audibility. This coding method is roughly classified into time domain coding, frequency domain coding, analysis / synthesis coding, and the like.
[0003]
Examples of high-efficiency coding for speech signals include MBE (Multiband Excitation) coding, SBE (Singleband Excitation) coding, Harmonic coding, SBC (Sub-band Coding: Spectral amplitude and its parameters (LSP parameters) in band division coding), LPC (Linear Predictive Coding), DCT (Discrete Cosine Transform), MDCT (Modified DCT), FFT (Fast Fourier Transform), etc. In the prior art, scalar quantization is often performed when various types of information data such as (, α parameter, k parameter, etc.) are quantized.
[0004]
[Problems to be solved by the invention]
By the way, if the bit rate is reduced to, for example, about 3 to 4 kbps and the quantization efficiency is further improved, the quantization noise (distortion) becomes large in the scalar quantization, which is difficult to put into practical use. Therefore, the time axis data, frequency axis data, filter coefficient data, etc. obtained at the time of encoding are not individually quantized, and a plurality of data are grouped (vector) and expressed by one code. Vector quantization, which is quantized, has attracted attention.
[0005]
However, since the number of spectral amplitude data such as MBE, SBE, LPC, etc. varies depending on the pitch, variable quantization is required to perform vector quantization as it is, and the configuration is only complicated. In other words, it is difficult to obtain good characteristics.
[0006]
In addition, even when a difference between data blocks (frames) is obtained before quantization, the difference cannot be obtained unless the number of data in the preceding and succeeding blocks (frames) matches. As described above, it may be necessary in the course of data processing to convert a variable number of data to a certain number, but it is desired to convert the number of data with good characteristics.
[0007]
Therefore, the present applicant can convert a variable number of data into a fixed number in the specification and drawings of Japanese Patent Application No. Hei 4-92263, and has a good number of data that does not cause linking at the end points. A data number conversion method that can be converted was proposed. In this method, a variable number of data is nonlinearly compressed by a non-linear compression unit for each block, and dummy data is added by a dummy data adding unit to perform interpolation from the last data value to the first data value in the block. After expanding the number of data, oversampling is performed by a band-limited oversampling unit having a fast Fourier transform (FFT) processing unit, an inverse fast Fourier transform (IFFT) processing unit, etc., and linear interpolation is performed by a linear interpolation unit. The data is converted into a fixed number of sample data by thinning out by the processing unit.
[0008]
In the data number conversion method according to this application, when performing FFT, one block is extended to, for example, 256 samples. Next, in order to realize, for example, 8 times oversampling, an intermediate 0 such that 7 (= 8-1) 0s are packed in the middle of each sample with respect to the spectral data of 256 samples obtained by the FFT transform. The stuffing process is performed to obtain 2048 samples, and IFFT is calculated for these 2048 samples.
[0009]
By the way, in normal FFT and IFFT, when the number of samples of one block is N, (N / 2 × log₂N) complex multiplication and (Nlog₂N) complex addition is performed. Where (N / 2log₂N) complex multiplication is (N / 2 × log₂N × 4) real multiplication, (Nlog₂N) complex addition is (Nlog₂N × 2) real number addition. Therefore, the amount of FFT calculation when N is 256 is 4096 (= 256/2 × 8 × 4), and the amount of IFFT calculation when N = 2048 is 45056 (= 2048/2 ×). 11 × 4), and the total is 49152 times.
[0010]
Further, even if a so-called high-speed technique capable of realizing an N-point FFT with an N / 2-point FFT for all real inputs, N / 4 (log₂N-1) × 4 + N × 4 real multiplication and N / 2 (log₂N-1) × 2 + N × 2 real number addition is required. That is, in the FFT when N = 256, multiplication is performed 2816 times and addition is performed 2304 times. Further, in IFFT when N = 2048, multiplication is performed 28672 times and addition is performed 24576 times. Accordingly, 31488 operations are required even with multiplication alone.
[0011]
Note that the above assumes a data number (sample rate) conversion in which a variable number (8 to 63) of sample data is converted into a fixed number (44) of sample data in a block (frame) during encoding. However, in the case of decoding, a predetermined number (44) of sample data in a block (frame) is converted into a variable number (8 to 63) of sample data by a similar method.
[0012]
By the way, the actual number of points to be obtained is about 44 out of IFFT performed at 2048 points during encoding, and the number of samples to be finally obtained is the maximum even when decoding is taken into consideration. However, the number is about 63, and the property of performing such thinned-out operations has not been utilized.
[0013]
The present invention has been made in view of such circumstances, and can reduce the amount of calculation while converting a variable number of data to a constant number during encoding, and at the time of decoding. An object of the present invention is to provide an encoding device and a decoding device using data number conversion that can convert a certain number of data into a variable number of data.
[0015]
[Means for Solving the Problems]
  The encoding apparatus according to the present invention divides an input audio signal into blocks, extracts a variable number of waveform data in the block or parameter data representing waveforms, and extracts the extracted variable number of sequence data for each block. An encoding device for converting and encoding the variable number of sequence data into the fixed number of sequence data for comparison with a fixed number of reference data, a storage means for storing a plurality of coefficient sets, The data is added to both ends of the series for the variable number of series data to generate new series data consisting of a predetermined fixed number of data, and the coefficient set corresponding to each position of the fixed number of data is stored in the storage means And multiplying each of the plurality of coefficients included in the selected coefficient set by the new series data associated with each coefficient. Means for obtaining intermediate output data by adding a plurality of values calculated by adding the intermediate values, and means for obtaining a certain number of series data required by interpolating the intermediate output data This solves the above-mentioned problem.
[0017]
  The decoding apparatus according to the present invention divides an input audio signal into blocks, extracts a variable number of waveform data in the block or parameter data representing waveforms, and extracts the extracted variable number of sequence data for each block. In order to compare with the fixed number of reference data, the variable number of sequence data is converted into the fixed number of sequence data to receive the encoded code sequence, and the fixed number of sequence data is received from the code sequence. A decoding device for decoding and inversely transforming from the fixed number of decoded sequence data into a variable number of sequence data, a storage means for storing a plurality of coefficient sets, and the decoded fixed number of sequence data Is added to both ends of the series to generate new series data consisting of a predetermined fixed number of data, and at each position of the fixed number of data. A corresponding coefficient set is selected from the storage means, and each of a plurality of coefficients included in the selected coefficient set is multiplied and multiplied by the new series data associated with each coefficient. The above-described problem is obtained by means for obtaining intermediate output data by adding a plurality of values and means for obtaining a variable number of series data required by interpolating the intermediate output data. To solve.
[0018]
【Example】
Embodiments of an encoding device and a decoding device according to the present invention will be described below with reference to the drawings.
[0019]
FIG. 1 shows a schematic configuration of the data number conversion used in the encoding apparatus according to the first embodiment of the present invention. This first embodiment is applied to an MBE vocoder described later. That is, it is a method of converting amplitude data in which the number of spectrum envelopes calculated by the MBE vocoder is variable into a fixed number.
[0020]
In FIG. 1, spectrum envelope amplitude data and the like calculated by an MBE vocoder described later are supplied to an input terminal 11. For example, the amplitude data is obtained by analyzing a speech signal having a spectrum as shown in FIG. 2A to obtain a pitch frequency (angular frequency) ω, and considering the periodicity of the spectrum corresponding to the pitch frequency ω. From the amplitude at each harmonic (harmonic) position, it is obtained as amplitude data representing a spectrum envelope (envelope) as shown in FIG. The number of amplitude data varies depending on the pitch frequency ω within a certain effective band (for example, 200 to 3400 Hz). Therefore, as shown in FIG. 2C, a fixed frequency (angular frequency) ω_c  The number of data can be made constant by obtaining the amplitude data of the spectrum envelope at each harmonic position.
[0021]
In the example of FIG. 1, a variable number M (for example, M = 8 to 63) of input data from the input terminal 11 is compressed (logarithmically compressed) into, for example, a dB region by the nonlinear compression unit 12, and then the number of data The conversion main body 13 converts the data into a certain number of data. The data number conversion main body 13 includes a spectrum envelope expansion unit 14, a band-limited FIR filter 15, and a linear interpolation unit 16.
[0022]
The variable number M of input data for each block is non-linearly compressed by the non-linear compression unit 12 and is extended back and forth by repeating values at both ends of the spectrum envelope by the spectrum envelope expansion unit 14. The spectrum envelope whose both ends are extended back and forth is supplied to the FIR filter 15. The FIR filter 15 uses a coefficient set corresponding to a position in the vicinity of each position of the fixed number of data out of a plurality of coefficient sets corresponding to a plurality of different phases with respect to the sample points of the input data, Find intermediate output data. This intermediate output data is supplied to the linear interpolation unit 16 and is linearly interpolated to become a fixed number of data required for final output, and is output from the output terminal 17.
[0023]
Here, M (m) calculated in the MBE vocoder described below._MX+1) amplitude data string is a [m]. m is the order or band number of the above harmonics (harmonics), m_MXIs the maximum value, but the number of amplitude data of all bands including the amplitude data of the band of m = 0 is m._MX+1. The amplitude data a [m] is converted into, for example, a dB region by the nonlinear compression unit 12. That is, the obtained data is a_dB[M]
a_dB[M] = 20 log₁₀a [m] (1)
It is. This logarithmically converted amplitude data a_dB[M] number m_MXSince +1 changes depending on the pitch as described above, it is converted into a fixed number of amplitude data b [m]. This is a kind of sampling rate (sample rate) conversion. Note that the compression processing in the non-linear compression unit 12 may perform pseudo logarithmic compression processing such as so-called μ-law and α-law in addition to logarithmic compression to the dB region. Thus, efficient encoding is realized by compressing the amplitude.
[0024]
The sampling frequency fs for the audio signal on the time axis input to the MBE vocoder is usually 8 kHz, the total bandwidth is 3.4 kHz (however, the effective bandwidth is 200 to 3400 Hz), and the higher female voice to the lower male voice. The pitch lag (number of samples corresponding to the pitch period) is about 20 to 147. Therefore, the pitch (angular) frequency ω varies between 8000 / 147≈54 (Hz) and about 8000/20 = 400 (Hz). Accordingly, about 8 to 63 pitch pulses (harmonics) are set up to 3.4 kHz on the frequency axis. That is, m consisting of 8 samples to 63 samples as the waveform of the dB region on the frequency axis._MXThe number of samples is converted into a certain number of samples, for example, 44 samples from +1 data. This is a constant pitch frequency (angular frequency) ω as shown in FIG._C  This corresponds to obtaining a sample of each harmonics position.
[0025]
Next, the spectral envelope expansion unit 14 is nonlinearly compressed by the nonlinear compression unit 12 as described above, and a_dBM that can be represented by the array of [m]_MXExtend the values at both ends of the +1 spectral envelope back and forth. This is done to prevent ringing from occurring at the end of the spectral envelope. The number sequence thus created is a_JdB  If [m], this a_JdB  [M] is-(f₀  −1) / 2 ≦ m <M + (f₀  -1) / 2,
[0026]
[Expression 1]

[0027]
It becomes. Where f₀  Is, for example, 9 and the order F of the FIR filter to be used next (in terms of the sampling rate after oversampling)₀  For example, 65 and F₀  = O_S  × (f₀  -1) is a constant having a relationship of +1. F₀  −1 is the filter order when the spectral envelope expansion is considered as a kind of oversampling, and is viewed at the sampling rate before oversampling.₀  Is the order of the filter when viewed at the sampling rate after oversampling. O_S  Is the oversampling ratio. FIG. 3 shows this a_JdB  It is a figure which shows [m]. That is, this a_J _dB  [M] is the original waveform a shown in the interval 0 ≦ m <M_dB[M] left end F₀A_dB[0] remains-(f₀  -1) and the right end is the last data a_dB[M-1] M + (f₀  -1) / 2 extended.
[0028]
Originally, band-limited oversampling is, for example, (O_S  -1) F to zero-padded data₀  The product may be passed through the next filter, but the sum of products for 0 data may be ignored. Therefore, band-limited oversampling is performed for each set of eight phase coefficient sets (P = 0... 7) (f₀  -1) Consists of coefficients, the original F₀  This can be regarded as a filter process for oversampling individual coefficients.
[0029]
4 shows F₀  To 65, f₀  9 、 O_S  FIG. 6 is a diagram showing the phase coefficient of the FIR filter 15 when the value is 8; FIG. 4A shows the phase change range from −4π to 4π.₀  The size of 65 (65) coefficients is shown as an amplitude value. The coefficient value is 1 when the phase change is 0π, but 0 when the phase change is ± 4π, ± 3π, ± 2π, and ± π. Further, A in FIG. 4 is symmetrical with respect to the amplitude of 0π. 4B shows where the coefficient value of the coefficient set of each phase of P = 0... 7 corresponds to A in FIG. This coefficient value can be derived by a known method.
[0030]
Here, in the present invention, by using a coefficient set corresponding to each position of the predetermined number of data among a plurality of coefficient sets corresponding to a plurality of different phases with respect to the sample points of the input data, By obtaining the data necessary as output or data in the vicinity of the necessary data, the amount of computation is reduced by thinning out the computation itself.
[0031]
FIG. 5 shows that using the coefficient set (P = 0... 7) shown in FIG._JdB  It is a figure for demonstrating the calculation for filtering [m] and obtaining arbitrary one of the fixed number of data b [m] required as an output.
[0032]
A in FIG._dB[M] is shown. This a_dBIn order to obtain b [m] from [m], the spectrum envelope extension unit 14_dBA) as shown in FIG._JdB  [M] is obtained first. i is an index of a variable number M of data.
[0033]
For example, a case where b [m] at the point b shown in FIG.
[0034]
A set of coefficient values closest to the point b is a coefficient set of P = 2. The coefficient set of P = 2 has coefficient values as shown in B of FIG. Each coefficient value is p₂₀, P₂₁, P₂₂, P₂₃, P₂₄, P₂₅, P₂₇And Then, b [m] at point b is the data of index i = 0 and the coefficient value p₂₀, Data of index i = 1 and coefficient value p₂₁, The data of index i = 2 and the coefficient value p₂₂, The data of index i = 3 and the coefficient value p₂₃, The data of index i = 4 and the coefficient value p₂₄, The data of index i = 5 and the coefficient value p₂₅, The data of index i = 6 and the coefficient value p₂₆, The data of index i = 7 and the coefficient value p₂₇And the sum of eight multiplication values.
[0035]
Now F₀  The number of coefficients 0 ≦ k <F₀  Coef [k] in the range of 0 ≦ m <M · O_S  B [m] in the range is expressed by the following equation (3).
[0036]
[Expression 2]

[0037]
From this equation (3), for example, b [m] at point b is O_S If = 8, b [3 × 8 + 2] and b [26] are obtained, and the amplitude value of the sample data of b [26] is obtained.
[0038]
Where O_S = 8, f₀ = 9, the above equation (3) becomes
[0039]
[Equation 3]

[0040]
It becomes.
[0041]
Here, when P = 0, the above equation (4) is
[0042]
[Expression 4]

[0043]
Thus, the amplitude of the data of i = 0, 1,... 7 is obtained as it is.
[0044]
In addition, when P = 1.
[0045]
[Equation 5]

[0046]
And a_JdB  [I-3], a_JdB  [I-2], a_JdB  [I-1], a_JdB  [I-0], a_JdB  [I + 1], a_JdB  [I + 2], a_JdB  [I + 3], a_JdB  For eight data of [i + 4], coef [8-P], coef [16-P], coef [24-P], coef [32-P], coef [40-P], coef [48- It can be seen that 8 coefficients of P], coef [56-P], and coef [64-P] are respectively multiplied, and all the 8 multiplied values are added to obtain b [m].
[0047]
For example, in the example of b [m] at point b shown in FIG. 5A, since i = 3 and P = 2, a_JdB  [0], a_JdB  [1], ... a_JdB  The eight data of [7] are respectively multiplied by eight coefficients of coef [6], coef [14],... Coef [62], and all the eight multiplied values are added to b. [26] is obtained.
[0048]
For example, when obtaining b [m] where i = 0 and P = 3, the data a obtained by the spectrum envelope expansion unit 14_JdB  [-3], a_JdB  [-2], a_JdB  [-1], three data obtained by multiplying a total of three coefficients of coef [5], coef [11], and coef [22], respectively, and a_JdB  [0], a_JdB[1], a_JdB  [2], a_JdB  [3], a_JdB  [5] From the 5 multiplication values obtained by multiplying the total 5 data of [4] by 5 coefficients of coef [29], coef [37], coef [45], coef [53], and coef [61], respectively. A total of 8 multiplication values are added to obtain b [3].
[0049]
Where a_JdB  [I + j- (f₀  -1) / 2] in [] is i_min  -(F₀  -1) / 2 = 1- (f₀  -1) / 2, i_max  + (F₀  -1) / 2 = m_MX+ (F₀  With regard to -1) / 2, there is no problem because the data is expanded by the spectrum envelope expansion unit 14. Here, the product required to obtain one point of b [m] is (f₀  -1) times.
[0050]
By the way, in the above description, it has been described that the position of a certain number of data (for example, 44) that is finally required matches the position of any of the oversample points (for example, 8 times). In reality, in order to obtain such a match, it is necessary to make the oversample ratio (multiple) extremely high, and since the number of filter coefficients becomes enormous, it is ultimately necessary. The data of the oversampled points in the vicinity of the position of the data to be processed (for example, two points before and after) is obtained as an intermediate output by the above filtering operation, and the intermediate output is interpolated to obtain the above finally required It is preferable to obtain data to be processed.
[0051]
That is, the FIR output from the FIR filter 15 is supplied to the linear interpolation unit 16. The linear interpolation unit 16 linearly interpolates at least two FIR outputs from the FIR filter 15 to obtain necessary output points. For example, in FIG.₀  Can be obtained by linear interpolation, the point A₀  2 points A across_-1, A₁  May be calculated by the FIR filter. Accordingly, if the number of output points in the final encoder obtained by the data number conversion main body 13 is 44, 44 × 2 (= 88) points may be calculated by the FIR filter 15.
[0052]
Two points A to obtain this necessary output point_-1, A₁  The process for obtaining b [m] will be described with reference to the flowchart of FIG.
[0053]
In step S1, the angular frequency on the input side is set to O_S  Divided by (sampling ratio)_0fAnd the angular frequency ω on the output side₀  Ask for. In the first embodiment, O_S  Since (for example, 8) times of sampling is performed, the spectrum is 1 / O of the angular frequency of the input._S  Standing at intervals. Therefore, O_S  Divided by ω_0fPut out. For example, when 0 to π is expressed by a grid of 1024, this ω_0fIs 1024 / M × 1 / O_S  It becomes. The angular frequency of the desired point (output side) is ω₀  And this ω₀  Becomes 1024 / M '. Here, M ′ is the number of harmonics on the output side.
In step S2, the input-side harmonics index i and the output-side harmonics index ii are initialized.
[0054]
In step S3, the coefficient set P is initialized.
[0055]
In step S4, the position A of the data to be obtained from the index i of the input side harmonics and the coefficient set P.₀  Search (scan). That is, the position A of the data to be obtained₀  (= Ω₀  Xii) is the scan position A with i and P₁  (= I × O_S  It is determined whether or not + P + 1) has been exceeded. For example, since i and P are initialized in the above steps S2 and S3, the search is performed with i = 0 and P = 0. If YES is determined here, the process proceeds to step S5. If NO is determined, the process proceeds to step S7.
[0056]
In step S5, the position A of the data to be obtained₀  (= Ω₀  Scan position A with i and P exceeding xii)₁  (= I × O_S  B [m] at + P + 1), ie b [i × O_S  + P + 1] and the previous one (A_-1) B [i × O_S  + P]. This b [i × O_S  + P + 1] and b [i × O_S  + P] is the position A of the data to be obtained₀  (= Ω₀  Xii) Position to sandwich (A)₁  And A_-1(B) [m].
[0057]
In step S6, the index ii of the output harmonics is incremented in order to move the position of the data to be obtained next.
[0058]
In step S7, the coefficient set P is incremented to move the scan position. At this time, i remains 0. That is, P is changed from 0 to 1 while i = 0.
[0059]
In step S8, the coefficient set P is O_S  It is determined whether or not the value matches. P is 8 up to 0 ... 7, O_S  Is also 8. If YES is determined here, the process proceeds to step S9. If NO is determined, the process proceeds to step S4.
[0060]
In step S9, the index i of the input side harmonics is incremented. Then, the process proceeds to step S10.
[0061]
In step S10, it is determined whether i is equal to the variable number of data and the number (M). If “YES” is determined here, the flow ends. If “NO” is determined, the process returns to the step S3.
[0062]
From the above flowchart, this embodiment is O_S  (Here O_S  = 8) times oversampling pitch (angular frequency) ω_0fIn this interval, the frequency is incremented and b [m] at the point beyond the desired point and b [m] just before that are obtained. In this way, all the left and right points necessary for obtaining the output point by linear interpolation are calculated.
[0063]
Next, the linear interpolation unit 16 linearizes b [m] obtained by the processing of the flowchart shown in FIG. 7 above the point desired as the output and b [m] immediately before it. This will be described with reference to the flowchart of FIG.
[0064]
In step S21, the output angular frequency ω₀  And the input angular frequency ω_0fAnd ask. This is the same as step S1 shown in FIG.
[0065]
In step S22, since the subsequent flow is incremented by an index i that is eight times the harmonics on the input side, i is initialized.
[0066]
In step S23, it is determined whether i = 0. If YES is determined here, the process proceeds to step S24, and if NO is determined, the process proceeds to step S25.
[0067]
In step S24 and step S25, paying attention to one section as shown in FIG._w  And the upper limit is u_b  , Lower limit is l_b  It is said. This upper limit u_b  Is inint (i + 1) × ω_0fAnd the lower limit l_b  Is inint i × ω_0fIt becomes. Here, inint is a function that returns the number closest to x when inint (x). In addition, the lower limit l_b  Is the upper limit u_b  It becomes. Therefore, b_w  Is the upper limit u_b  And lower limit l_b  And the difference.
[0068]
In step S24, the lower limit l_b  And go to step S26.
[0069]
In step S25, the lower limit l_b  And upper limit u_b  To match.
[0070]
In step S26, as described above, the upper limit u_b  Inint (i + 1) × ω_0fAnd set.
[0071]
In step S27, the upper limit u_b  And lower limit l_b  Which is the difference between_w  Ask for. And this b_w  To obtain a linear interpolation value c [ii].
[0072]
In step S28, c [ii] to be obtained shown in FIG._b  Difference with i_dxIs set to 0. I_dx= 0 position (lower limit l_b  Set the scan start position to start scanning.
[0073]
In step S29, as described above, the lower limit l_b  Starts scan j.
[0074]
In step S30, it is determined whether or not the scan j matches the position of c [ii] to be obtained. If YES is determined here, the process proceeds to step S31. If NO is determined, the process proceeds to step S32.
[0075]
In step S31, c [ii] is calculated in consideration of the weighting related to the positional relationship. Here, for example, i_dxIs 0, c [ii] = b [i], i_dxIs b_w  In this case, c [ii] = b [i + 1].
In step S32, i_dxIs incremented. In step S33, it is determined whether or not the output harmonic index ii is larger than the number M ′ of output harmonics. If YES is determined here, the flow ends. If NO is determined, the process proceeds to step S34.
[0076]
In step S34, the scan j starts to be repeated.
[0077]
In step S35, the scan j is the upper limit u._b  It is determined whether or not it has reached. If YES is determined here, the process proceeds to step S36, and if NO is determined, the process returns to step S30.
[0078]
In step S36, the harmonics i on the input side is incremented.
[0079]
In step S37, i is the input harmonics M and O._S  It is determined whether or not the product is larger than the product of. If YES is determined here, the flow ends. If NO is determined, the process returns to step S23.
[0080]
From the above flowchart, in this embodiment, only necessary points can be obtained by linearly interpolating b [m] obtained by the process of the flowchart of FIG.
[0081]
As described above, in the first embodiment, by obtaining only necessary points, it is possible to make a fixed number of data whose number is variable. Therefore, the calculation amount is reduced.
[0082]
In this way, the numerical sequence converted into the data of a certain number of samples may be differenced between blocks or frames as necessary, and vector quantization may be performed to transmit the index.
[0083]
The first embodiment described above is a method of converting the amplitude data in which the number of spectrum envelopes calculated by the MBE vocoder is variable into a constant number. Hereinafter, the second embodiment will be described as a constant number. A data number conversion method for converting the obtained data into the number of data corresponding to the data contents will be described. This second embodiment is applied to the decoder side that synthesizes an audio signal, for example. That is, on the decoder side, waveform data having a fixed number of vector quantized and inverse quantized number sequences is obtained from the index, and the data sequences are obtained in the same manner, that is, band-limited oversampling and linear interpolation. Are converted into M number sequences corresponding to the content of the data.
[0084]
FIG. 9 shows a schematic configuration of the second embodiment.
[0085]
In the first embodiment, the fixed number of input data is supplied to the data number conversion main body 22 via the input terminal 21, and the data number conversion main body 22 makes the variable number of data and the output terminal 26. Is output from. The data number conversion main body 22 includes a spectrum envelope expansion unit 23, a band limited FIR filter 24, and a linear interpolation unit 25.
[0086]
For a certain number of input data for each input block, the values at both ends of the spectrum envelope are extended by the spectrum envelope extension unit 23. The spectrum envelope whose both ends are extended back and forth is supplied to the FIR filter 24. The FIR filter 24 has a spectral envelope that is extended to increase the number of data. The FIR filter 24 is located near each position of a fixed number of data in a plurality of coefficient sets corresponding to a plurality of different phases with respect to sample points of the data. Intermediate output data is obtained by using a coefficient set corresponding to the position. The intermediate output data is supplied to the linear interpolation unit 25. The linear interpolation unit 25 linearly interpolates the intermediate output data, is thinned out from the output terminal 26, and outputs a variable number of data corresponding to the data contents.
[0087]
In the second embodiment, by obtaining only necessary points, the data whose number is constant can be converted into the number according to the data content. Therefore, the calculation amount is reduced.
[0088]
Here, the number of multiplications according to the first embodiment is such that if the number of data to be obtained is 44, the multiplication of 88 data, which is twice that number, is performed 8 times, and it is 1024 times. Multiplication of This is 1/45 of the total number of 31488 times of multiplication of FFT and IFFT using the speed-up method described above. Further, the number of multiplications according to the second embodiment is such that if the number of pieces of data to be obtained is 60, the multiplication is performed 8 times for twice the 120 pieces of data. This is 1/30 of a total of 31488 times of the number of multiplications of FFT and IFFT using the speed-up method described above.
[0089]
Next, a specific example of an MBE (Multiband Excitation) vocoder of a speech signal synthesis analysis coding apparatus (so-called vocoder) to which the data number conversion method as described above can be applied will be described with reference to the drawings. While explaining.
[0090]
The MBE vocoder described below is disclosed in DW Griffin and JS Lim, “Multiband Excitation Vocoder,” IEEE Trans. Acoustics, Speech, and Signal Processing, vol. 36, No. 8, pp. 1223-1235, Aug. 1988. In conventional PARCOR (PARtial auto-CORrelation) vocoders, etc., voiced and unvoiced sections were switched for each block or frame when voice was modeled. The MBE vocoder is modeled on the assumption that a voiced (Voiced) section and an unvoiced (Unvoiced) section exist in the frequency axis region at the same time (in the same block or frame).
[0091]
FIG. 10 is a block diagram showing an overall schematic configuration of an embodiment in which the present invention is applied to the MBE vocoder.
[0092]
In FIG. 10, an audio signal is supplied to an input terminal 101, and this input audio signal is sent to a filter 102 such as an HPF (High Pass Filter) and so-called DC (direct current) offset. At least a low frequency component (200 Hz or less) is removed for removal or band limitation (for example, limitation to 200 to 3400 Hz). The signal obtained through the filter 102 is sent to the pitch extraction unit 103 and the windowing processing unit 104, respectively. In the pitch extraction unit 103, the input audio signal data is divided into blocks in units of a predetermined number of samples N (for example, N = 256) (or cut out by a rectangular window), and pitch extraction is performed on the audio signals in this block. . Such cutout blocks (256 samples) are moved in the time axis direction at a frame interval of L samples (for example, L = 160) as shown in FIG. 11A, for example. There are L samples (for example, 96 samples). Further, the windowing processing unit 104 applies a predetermined window function, for example, a Hamming window to 1 block N samples, and sequentially moves the windowed block in the time axis direction at intervals of 1 frame L samples.
[0093]
Such a windowing process is expressed by a mathematical expression.
x_w  (k, q) = x (q) w (kL-q) (7)
It becomes. In this equation (7), k represents a block number, q represents a time index (sample number) of data, and a window (window) of the k-th block with respect to q-th data x (q) of the input signal before processing. ) Data x by windowing with function w (kL-q)_w  (k, q) is obtained. Window function w in the case of a rectangular window as shown in FIG._r  (r) is

Further, the window function w in the case of a Hamming window as shown in FIG._h (r) is

It is. Such a window function w_r  (r) or w_h  When using (r), the non-zero interval of the window function w (r) (= w (kL-q)) in the above equation (7) is
0 ≦ kL−q <N
Transform this,
kL-N <q ≦ kL
Thus, for example, in the case of the above rectangular window, the window function w_r  (kL−q) = 1 is satisfied when kL−N <q ≦ kL as shown in FIG. Further, the above equations (7) to (9) indicate that the window of length N (= 256) samples advances by L (= 160) samples. In the following, the zero-zero sample sequences of the N points (0 ≦ r <N) cut out by the window functions of the above equations (8) and (9) are respectively represented by x_wr(k, r), x_whLet it be expressed as (k, r).
[0094]
In the windowing processing unit 104, as shown in FIG. 13, a sample row x of 1 block 256 samples to which the Hamming window of the above equation (9) is applied._wh0 data for 1792 samples is added to (k, r) to form 2048 samples (so-called zero-padded), and the orthogonal transform unit 105 performs, for example, FFT ( Orthogonal transformation processing such as fast Fourier transformation is performed. Alternatively, FFT may be performed with 256 points (no padding).
[0095]
In the pitch extraction unit 103, the above x_wrPitch extraction is performed based on a sample sequence (k, r) (one block N samples). Known pitch extraction methods include periodicity of time waveform, periodic frequency structure of spectrum, and autocorrelation function. In this embodiment, autocorrelation method of center clip waveform is adopted. is doing. As for the center clip level in the block at this time, one clip level may be set for each block, but the peak level of the signal of each part (each sub-block) obtained by subdividing the block is detected, and these When the difference between the peak levels of the sub-blocks is large, the clip level is changed stepwise or continuously within the block. The pitch period is determined based on the peak position of the autocorrelation data of the center clip waveform. At this time, a plurality of peaks are obtained from autocorrelation data belonging to the current frame (the autocorrelation is obtained from data of 1 block N samples), and the maximum peak among these peaks is equal to or greater than a predetermined threshold value. In this case, the maximum peak position is set as the pitch period, and in other cases, the pitch within the pitch range satisfying a predetermined relationship with respect to the pitch obtained in the frames other than the current frame, for example, the preceding and following frames, for example, the pitch of the previous frame is centered. As such, a peak within a range of ± 20% is obtained, and the pitch of the current frame is determined based on this peak position. The pitch extraction unit 103 performs a search for a relatively rough pitch by an open loop, and the extracted pitch data is sent to a high-precision (fine) pitch search unit 106 for a high-precision pitch search (pitch by closed loop). (Fine search) is performed.
[0096]
The high precision (fine) pitch search unit 106 includes coarse pitch data of integer (integer) values extracted by the pitch extraction unit 103, and data on the frequency axis subjected to, for example, FFT by the orthogonal transformation unit 105. Is supplied. This high-accuracy pitch search unit 106 swings ± several samples at intervals of 0.2 to 0.5 around the coarse pitch data value, and drives the value to the optimum fine pitch data value with a decimal point (floating). As a fine search method at this time, a so-called analysis by synthesis method is used, and the pitch is selected so that the synthesized power spectrum is closest to the power spectrum of the original sound.
[0097]
This pitch fine search will be described. First, in the MBE vocoder, S (j) as spectrum data on the frequency axis orthogonally transformed by the FFT or the like is used.
S (j) = H (j) | E (j) | 0 <j <J (10)
This model is assumed to be expressed as Where J is ω_s/ 4π = f_s/ 2 and sampling frequency f_s= Ω_sWhen / 2π is, for example, 8 kHz, it corresponds to 4 kHz. In the above equation (10), when the spectrum data S (j) on the frequency axis has a waveform as shown in FIG. 14A, H (j) is the original spectrum data S as shown in B of FIG. (j) shows a spectrum envelope (envelope), and E (j) shows a spectrum of an excitation signal (excitation) that is periodic at an equal level as shown in C of FIG. That is, the FFT spectrum S (j) is modeled as the product of the spectrum envelope H (j) and the power spectrum | E (j) | of the excitation signal.
[0098]
The power spectrum | E (j) | of the excitation signal corresponds to a waveform of one band (band) in consideration of the periodicity (pitch structure) of the waveform on the frequency axis determined according to the pitch. It is formed by arranging the spectrum waveform so as to be repeated for each band on the frequency axis. For example, the waveform for one band is obtained by performing FFT by regarding a waveform obtained by adding 0 data for 1792 samples to the Hamming window function of 256 samples as shown in FIG. It can be formed by cutting out an impulse waveform having a certain bandwidth on the frequency axis in accordance with the pitch.
[0099]
Next, for each band divided according to the pitch, a value (a kind of amplitude) | A that represents H (j) (which minimizes an error for each band) | A_m  Find |. Here, for example, the lower limit and upper limit points of the m-th band (m-th harmonic band) are respectively a_m  , B_m  The error of this m-th band ε_m  Is
[0100]
[Formula 6]

[0101]
It can be expressed as This error ε_m That minimizes | A_m |
[0102]
[Expression 7]

[0103]
Of this equation (12)_m  When │, error ε_m  Minimize. Such amplitude | A_m  | Is obtained for each band, and each obtained amplitude | A_m  The error ε for each band defined by the above equation (11) using |_m  Ask for. Next, such an error ε for each band_m  Sum of all bands of Σε_m  Ask for. Furthermore, the error total value Σε for all the bands_m  Is calculated for several slightly different pitches, and the error sum Σε_m  Find the pitch that minimizes.
[0104]
That is, several types are prepared vertically, for example, in increments of 0.25, centering on the rough pitch obtained by the pitch extraction unit 103. The error sum value Σε for each of these slightly different pitches._m  Ask for. In this case, when the pitch is determined, the bandwidth is determined. From the above equation (13), the power spectrum | S (j) | of the data on the frequency axis and the excitation signal spectrum | E (j) | Expression error ε_m  And sum the values of all the bands Σε_m  Can be requested. This error total value Σε_m  Is determined for each pitch, and the pitch corresponding to the minimum error sum is determined as the optimum pitch. As described above, the optimum fine (for example, 0.25 step) pitch is obtained by the high-precision pitch search unit 106, and the amplitude | A corresponding to the optimum pitch | A_m  | Is determined.
[0105]
In the above description of the fine search for pitches, in order to simplify the description, it is assumed that all bands are voiced (Voiced). However, in the MBE vocoder as described above, on the frequency axis at the same time. Therefore, it is necessary to discriminate voiced / unvoiced sound for each band.
[0106]
Optimal pitch and amplitude from the high-precision pitch search unit 106 | A_m  The data of | is sent to the voiced / unvoiced sound discriminating unit 107, and the voiced / unvoiced sound is discriminated for each band. For this determination, NSR (noise to signal ratio) is used. That is, the NSR of the mth band is
[0107]
[Equation 8]

[0108]
When this NSR value is larger than a predetermined threshold (for example, 0.3) (error is large), | A in the band_m It is possible to determine that | S (j) | is not good approximation by || E (j) | (the excitation signal | E (j) | is inappropriate as a basis), and the band is UV (Unvoiced). Is determined. In other cases, it can be determined that the approximation has been performed to some extent satisfactory, and the band is determined as V (Voiced, voiced sound).
[0109]
Next, the amplitude re-evaluation unit 108 receives the on-frequency data from the orthogonal transform unit 105 and the fine amplitude from the high-precision pitch search unit 106 evaluated as | A_m | And V / UV (voiced / unvoiced sound) discrimination data from the voiced / unvoiced sound discrimination unit 107 are supplied. The amplitude re-evaluation unit 108 obtains the amplitude again for the band that has been determined to be unvoiced sound (UV) by the voiced / unvoiced sound determination unit 107. Amplitude for this UV band | A_m ｜_UVIs
[0110]
[Equation 9]

[0111]
Is required.
[0112]
The data from the amplitude reevaluation unit 108 is sent to the data number conversion (a kind of sampling rate conversion) unit 109. This data number conversion unit 109 is for making a certain number in consideration of the fact that the number of divided bands on the frequency axis differs according to the pitch and the number of data (especially the number of amplitude data) differs. is there. That is, for example, when the effective band is up to 3400 kHz, the effective band is divided into 8 to 63 bands according to the pitch, and the amplitude | A obtained for each of these bands | A_m | (Amplitude of UV band | A_m ｜_UVNumber of data m)_M _X+1 also changes from 8 to 63. Therefore, in the data number conversion unit 109, the variable number m_MXThe +1 amplitude data is converted into a certain number (for example, 44) of data.
[0113]
Here, in the first embodiment, as described with reference to FIGS. 1 to 8, the data at both ends in the block is extended with respect to the amplitude data for one effective band on the frequency axis. A certain number (for example, 44) of data is obtained by enlarging the number, performing filter processing using a band-limited FIR filter, and further performing linear interpolation.
[0114]
Data from the data number conversion unit 109 (the fixed number of amplitude data) is sent to the vector quantization unit 110, and is grouped into a vector for each predetermined number of data and subjected to vector quantization. The quantized output data from the vector quantization unit 110 is supplied to the CRC & rate 1/2 convolutional code adding unit 111 and also supplied to the frame interleaving unit 112. The CRC & rate 1/2 convolution is also used for high precision (fine) pitch data from the high precision pitch search unit 106 and voiced / unvoiced sound (V / UV) discrimination data from the voiced / unvoiced sound discrimination unit 107. It is supplied to the code adding unit 111.
[0115]
Here, the CRC & rate 1/2 convolutional code adding unit 111 uses the fine pitch data, the V / UV discrimination data, and the quantized output data to quantize the spectrum envelope, and outputs the hierarchical structure. The error correction by the convolutional code is effectively performed by dividing the importance of the index.
[0116]
This is because the present applicant proposed in Japanese Patent Application No. Hei 4-91422 the high-efficiency encoding method, that is, the M-dimensional vector is reduced to the S-dimensional (S <M) vector and the vector quantization is performed. This is a method that enables effective application of an error correction code in the same manner as a method of performing quantization having a hierarchically structured code book.
[0117]
Specifically, the Viterbi code & CRC detection on the decoder side is based on the following principle. FIG. 15 is a functional block diagram for explaining the principle of Viterbi decoding & CRC detection. For example, the speech parameter output from the speech encoder 121 is divided into a part (class 1) of 80 bits that is particularly important for hearing (class 1) and a part of other part (class 2) of 40 bits. CRC is calculated by CRC calculation block 122 for the more important 50 bits of class 1 to obtain a 7-bit result. A total of 92 bits including 80 bits of

class

1, 7 bits of CRC, and 5 tail bits for returning the initial value of the convolutional encoder to 0 are input to the convolutional coding unit 123, and an output of 184 bits is obtained. . A total of 224 bits, ie, 184 bits that are convolutionally encoded and 40 bits of class 2 bits, is interleaved by the 2-slot interleaver 124, and 224 bits are transmitted as the output.
[0118]
The frame interleave unit 112 shown in FIG. 10 corresponds to the 2-slot interleaver 124, and its output is transmitted from the output terminal 113.
[0119]
Each of these data is obtained by processing the data in the block of N samples (for example, 256 samples), but the block is in units of the frame of L samples on the time axis. Since moving forward, data to be transmitted is obtained in units of frames. That is, pitch data, V / UV discrimination data, and amplitude data are updated in the frame period.
[0120]
Next, as an embodiment of the decoding apparatus according to the present invention, a schematic configuration on the synthesis side (decoding side) for synthesizing an audio signal based on the output data obtained by transmission will be described with reference to FIG. To do.
[0121]
In FIG. 15, the output data to which the transmitted CRC & rate 1/2 convolutional code is added is supplied to the input terminal 131. Output data from the input terminal 131 is supplied to the frame deinterleaver 132 and deinterleaved. The deinterleaved data is supplied to the Viterbi decoding & CRC detection unit 133 and decoded.
[0122]
Then, the mask processing unit 134 masks the data from the frame deinterleaver 132 and supplies the quantized amplitude data to the inverse vector quantization unit 135.
[0123]
The inverse quantization unit 135 is also hierarchically structured, and synthesizes and outputs data that has been inverse vectorized based on the index data of each layer. The output data from the inverse quantization unit 135 is sent to the data number inverse transform unit 136 for inverse transform. In the data number inverse conversion unit 136, the same (inverse) conversion as described in FIG. 9 is performed, and the obtained amplitude data is sent to the voiced sound synthesis unit 137 and the unvoiced sound synthesis unit 138. The mask processing unit 134 supplies the encoded pitch data to the pitch decoding unit 139. The pitch data decoded by the pitch decoder 139 is sent to the data number inverse conversion unit 136, the voiced sound synthesis unit 137, and the unvoiced sound synthesis unit 138. The mask processing unit 134 supplies the V / UV discrimination data to the voiced sound synthesis unit 137 and the unvoiced sound synthesis unit 138.
[0124]
The voiced sound synthesizer 137 synthesizes a voiced sound waveform on the time axis by, for example, cosine wave synthesis, and the unvoiced sound synthesizer 138 synthesizes an unvoiced sound waveform on the time axis by filtering, for example, white noise with a bandpass filter. These voiced sound synthesis waveforms and unvoiced sound synthesis waveforms are added and synthesized by the adder 140 and taken out from the output terminal 141. In this case, the amplitude data, pitch data, and V / UV discrimination data are updated and given every frame (L sample, for example, 160 samples) at the time of the analysis, but increase continuity between frames (smoothing) Therefore, each value of the amplitude data and pitch data is set to each data value at the center position in one frame, for example, and each data value is interpolated until the center position of the next frame (one frame at the time of synthesis). Ask for. That is, in one frame at the time of synthesis (for example, from the center of the analysis frame to the center of the next analysis frame), each data value at the leading sample point and each data value at the end (tip of the next synthetic frame) sample point And each data value between these sample points is obtained by interpolation.
[0125]
Hereinafter, the synthesis process in the voiced sound synthesis unit 137 will be described in detail.
[0126]
The voiced sound for one synthesized frame (L samples, for example, 160 samples) on the time axis in the m-th band (m-th harmonic band) determined as V (voiced sound) is V_m  When (n) is used, using the time index (sample number) n in this composite frame,
V_m  (n) = A_m  (n) cos (θ_m  (n)) 0 ≦ n <L (15)
It can be expressed as. Add the voiced sounds of all bands identified as V (voiced sound) of all bands (ΣV_m  (n)) to synthesize the final voiced sound V (n).
[0127]
A in this equation (15)_m  (n) is the amplitude of the mth harmonic interpolated from the beginning to the end of the composite frame. Most simply, the value of the mth harmonic of the amplitude data updated in units of frames may be linearly interpolated. That is, the amplitude value of the m-th harmonic at the front end (n = 0) of the composite frame is expressed as A_0m, The amplitude value of the m-th harmonic at the end of the composite frame (n = L: the top of the next composite frame) is A_LmAnd when
A_m  (n) = (L-n) A_0m/ L + nA_Lm/ L (16)
A_m  (n) can be calculated.
[0128]
Next, the phase θ in the above equation (15)_m  (n) is

It can ask for. In this equation (17), φ_0mIndicates the phase (frame initial phase) of the m-th harmonic at the front end (n = 0) of the composite frame, and ω₀₁Is the fundamental angular frequency at the tip of the composite frame (n = 0), ω_L1Indicates the fundamental angular frequency at the end of the composite frame (n = L: the front of the next composite frame). Δω in the above equation (17) is the phase φ at n = L._LmIs θ_m  Set the smallest Δω equal to (L).
[0129]
Hereinafter, in any m-th band, the amplitude A corresponding to the V / UV discrimination result when n = 0 and n = L, respectively._m  (n), phase θ_m  Explain how to find (n).
[0130]
When the m-th band is set to V (voiced sound) for both n = 0 and n = L, the amplitude A_m  (n) is the transmitted amplitude value A according to the above equation (16)._0m, A_LmIs linearly interpolated to obtain amplitude A_m  (n) may be calculated. Phase θ_m  (n) is n = 0 and θ_m  (0) = φ_0mTo n = L and θ_m  (L) is φ_LmΔω is set so that
[0131]
Then, when n = 0, V (voiced sound), and when n = L, UV (unvoiced sound), the amplitude A_m  (n) is A_m  Transmission amplitude value A of (0)_0mTo A_m  Linear interpolation is performed so that (L) becomes 0. Transmission amplitude value A at n = L_LmIs the amplitude value of the unvoiced sound, and is used when unvoiced sound synthesis is described later. Phase θ_m  (n) is θ_m  (0) = φ_0mAnd Δω = 0.
[0132]
Furthermore, when n = 0, it is UV (unvoiced sound), and when n = L is V (voiced sound), the amplitude A_m  (n) is the amplitude A at n = 0_m  (0) is 0, and amplitude value A transmitted with n = L_LmLinear interpolation is performed so that Phase θ_m  For (n), the phase θ at n = 0_m  (0) as the phase value φ at the end of the frame_LmUsing,
θ_m  (0) = φ_Lm-M (ω_O1+ Ω_L1) L / 2 (18)
And Δω = 0.
[0133]
When both n = 0 and n = L are V (voiced sound), θ_m  (L) is φ_LmA method of setting Δω so that By putting n = L in the above equation (17),

When this is organized, Δω is
Δω = (mod2π ((φ_Lm−φ_0m) − ML (ω_O1+ Ω_L1) / 2) / L ... (19)
It becomes. In this equation (19), mod2π (x) is a function that returns the main value of x as a value between −π and + π. For example, when x = 1.3π, mod2π (x) = − 0.7π, when x = 2.3π, mod2π (x) = 0.3π, and when x = −1.3π, mod2π (x) = 0. 7π, etc.
[0134]
Here, A in FIG. 17 shows an example of a spectrum of an audio signal. Each band of band numbers (harmonics number) m is 8, 9, and 10 is UV (unvoiced sound), and the other bands are V ( Voiced). The time axis signal of the V (voiced sound) band is synthesized by the voiced sound synthesis unit 137, and the time axis signal of the UV (unvoiced sound) band is synthesized by the unvoiced sound synthesis unit 138.
[0135]
Hereinafter, the unvoiced sound synthesis process in the unvoiced sound synthesis unit 138 will be described.
[0136]
The white noise signal waveform on the time axis from the white noise generation unit 142 is windowed with an appropriate window function (for example, Hamming window) with a predetermined length (for example, 256 samples), and the STFT processing unit 143 performs STFT (short circuit). By performing (term Fourier transform) processing, a power spectrum on the frequency axis of white noise as shown in FIG. 17B is obtained. The power spectrum from this STFT processing unit 143 is sent to the band amplitude processing unit 144, and as shown in FIG. 17C, the above-mentioned amplitude | for the band (for example, m = 8, 9, 10) that has been set to the UV (unvoiced sound) A_m ｜_UVAnd the amplitude of the other V (voiced sound) band is set to zero. The band amplitude processing unit 144 is supplied with the amplitude data, pitch data, and V / UV discrimination data. The output from the band amplitude processing unit 144 is sent to the ISTFT processing unit 145, and the phase is converted into a signal on the time axis by performing inverse STFT processing using the phase of the original white noise. The output from the ISTFT processor 145 is sent to the overlap adder 146, and repeats overlap and addition while weighting appropriately (so that the original continuous noise waveform can be restored) on the time axis. A typical time axis waveform. An output signal from the overlap adder 146 is sent to the adder 140.
[0137]
As described above, the voiced sound part and the unvoiced sound part signal synthesized by the

synthesis units

137 and 138 and returned on the time axis are added by the adding unit 140 at an appropriate fixed mixing ratio, and the output terminal 141 is added. The reproduced audio signal is taken out.
[0138]
Here, the Viterbi decoding & CRC detection on the decoder side described above is based on the following principle. FIG. 18 is a functional block diagram for explaining the principle of Viterbi decoding & CRC detection. For example, the principle is as shown in FIG. First, the 2-slot deinterleaver 151 receives the transmitted 224 bits and deinterleaves it. The output of the 2-slot deinterleaver 151 is divided into class 1 bits encoded as class 2, and the latter is input to the convolutional decoder 152 and decoded, and the 80-bit class 1 decoding result is received 7 bits. Get. Next, a CRC is calculated again by the CRC calculation unit 153 from the one corresponding to the same parameter bit calculated by the encoder from the 80-bit class 1 decoding result, and compared with the received CRC, and the result is sent to the speech decoder 154. Output.
[0139]
Note that the configuration on the speech analysis side (encoding side) in FIG. 10 and the configuration on the speech synthesis side (decoding side) in FIG. 16 are described in hardware, but a so-called DSP (digital signal processor). It is also possible to realize by a software program using, for example.
In addition, this invention is not limited only to the said Example, For example, not only an audio | voice signal but an acoustic signal can also be used as an input signal.
[0140]
【The invention's effect】
As is apparent from the above description, according to the encoding device of the present invention, the input audio signal is divided into blocks, a variable number of waveform data in the block or parameter data representing waveforms is extracted, and the above extraction is performed. An encoding apparatus for converting the variable number of data into the fixed number and encoding the variable number of data for comparison with a fixed number of reference data for each block, wherein the variable number of data is An FIR filter for input band-limited oversampling, which corresponds to each position of the fixed number of data in a plurality of coefficient sets corresponding to a plurality of different phases with respect to the sampling points of the input data. By using the coefficient set, there is a means to obtain the above-mentioned fixed number of data required as output, so only the necessary points are calculated. Enables calculation him, it is reduced significantly the number of operations of the product-sum.
[0141]
In addition, according to the encoding device of another invention, the input audio signal is divided into blocks, a variable number of waveform data in the block or parameter data representing a waveform is extracted, and the extracted variable number of data In order to compare a predetermined number of reference data for each block, the encoding device converts the variable number of data into the predetermined number of data and encodes the data, and the band in which the variable number of data is input An FIR filter for limited oversampling, which corresponds to a position in the vicinity of each position of the fixed number of data in a plurality of coefficient sets corresponding to a plurality of different phases with respect to the sampling points of the input data. By using a coefficient set, a means for obtaining intermediate output data and a certain number of data required by interpolating the intermediate output data are used. Because it has a means for determining, it enables calculation decimated to calculate only points necessary, be reduced significantly the number of operations of the product-sum.
[0142]
According to the decoding device of the present invention, the input audio signal is divided into blocks, the variable number of waveform data in the block or the parameter data representing the waveform is extracted, and the extracted variable number of data is blocked. Receives an encoded code string by converting the variable number of data into the fixed number of data for comparison with a fixed number of reference data every time, and decodes the fixed number of data from the code string A decoding apparatus for inversely transforming the fixed number of decoded data into a variable number of data, wherein the input data is a FIR filter for band-limited oversampling to which the fixed number of data is input. Corresponding to each position of the variable number of data in a plurality of coefficient sets corresponding to a plurality of different phases with respect to the sample point By using a set of numbers, it has means to obtain the above-mentioned variable number of data required as output, so it is possible to perform thinned-out calculations to calculate only the necessary points, greatly increasing the number of product-sum operations. Can be reduced.
[0143]
According to another aspect of the present invention, a decoding apparatus according to another invention divides an input audio signal into blocks, extracts a variable number of waveform data in the block or parameter data representing a waveform, and extracts the extracted variable number of data. Receives an encoded code string by converting the variable number of data into the fixed number of data for comparison with a fixed number of reference data for each block, and decodes the fixed number of data from the code string A decoding device that reversely converts the fixed number of decoded data into a variable number of data, and is a FIR filter for band-limited oversampling to which the fixed number of data is input, and the input data Of the variable number of data in a plurality of coefficient sets corresponding to a plurality of different phases with respect to a plurality of sample points. By using a coefficient set corresponding to the position, there are means for obtaining intermediate output data and means for obtaining the variable number of data required by interpolating the intermediate output data. Thinned-out calculations that calculate only the necessary points are possible, and the number of product-sum operations can be greatly reduced.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a schematic configuration for explaining a data number conversion method used in a first embodiment of an encoding apparatus according to the present invention;
FIG. 2 is a waveform diagram for explaining an example of a change in the number of data.
FIG. 3 is a waveform diagram for explaining expansion of a spectrum envelope.
FIG. 4 is a diagram for explaining filter coefficients of an FIR filter.
FIG. 5 is a diagram for explaining an example in which an output point is actually obtained using the filter coefficient shown in FIG. 4;
FIG. 6 is a diagram for explaining how to obtain values used in linear interpolation and linear interpolation;
FIG. 7 is a flowchart for explaining how to obtain values used in linear interpolation;
FIG. 8 is a flowchart for explaining linear interpolation;
FIG. 9 is a diagram for explaining a second embodiment;
FIG. 10 is a functional block diagram showing a schematic configuration of an analysis side (encoding side) of a speech signal synthesis analysis encoding apparatus as a specific example of an embodiment of an encoding apparatus according to the present invention.
FIG. 11 is a diagram for explaining windowing processing;
FIG. 12 is a diagram for explaining a relationship between windowing processing and a window function.
FIG. 13 is a diagram illustrating time axis data as an orthogonal transform (FFT) processing target.
FIG. 14 is a diagram showing spectrum data on the frequency axis, a spectrum envelope (envelope), and a power spectrum of an excitation signal.
FIG. 15 is a diagram for explaining a CRC & convolutional code;
FIG. 16 is a diagram showing a schematic configuration on the synthesis side (decoding side) of a speech signal synthesis analysis coding apparatus as a specific example of an apparatus to which the data number conversion method is applied as an embodiment of the decoding apparatus according to the present invention; It is a block diagram.
FIG. 17 is a diagram for explaining unvoiced sound synthesis when a voice signal is synthesized;
FIG. 18 is a diagram for explaining CRC & convolutional decoding.
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 12 Nonlinear compression part, 13 Data number conversion main-body part, 14 Spectrum envelope expansion part, 15 Band-limited type FIR filter, 16 Linear interpolation part, 103 Pitch extraction part, 104 Windowing process part, 105 Orthogonal transformation (FFT) part, 106 High precision (fine) pitch search unit, 107 voiced / unvoiced sound (V / UV) discrimination unit, 108 amplitude reevaluation unit, 109 data number conversion (data rate conversion) unit, 110 vector quantization unit, 111 CRC & convolutional code Unit, 112 frame interleave unit

Claims

The input audio signal is divided into blocks, a variable number of waveform data in the block or parameter data representing the waveforms is extracted, and the extracted variable number of series data is compared with a certain number of reference data for each block. For this purpose, an encoding device for converting and encoding the variable number of sequence data into the fixed number of sequence data,
Storage means for storing a plurality of coefficient sets;
Data is added to both ends of the series for the variable number of series data to generate new series data consisting of a predetermined fixed number of data, and the coefficient set corresponding to each position of the fixed number of data is stored in the memory Each of the plurality of coefficients selected from the means is multiplied by the new series data associated with each coefficient, and a plurality of values calculated by multiplication are added to each of the plurality of coefficients included in the selected coefficient set. means for determining an intermediate output data by,
Means for interpolating the intermediate output data to obtain a required number of series data.

The input audio signal is divided into blocks, a variable number of waveform data in the block or parameter data representing the waveforms is extracted, and the extracted variable number of series data is compared with a certain number of reference data for each block. Therefore, a code sequence encoded by converting the variable number of sequence data into the fixed number of sequence data is received, the fixed number of sequence data is decoded from the code sequence, and the decoded fixed number a decoding apparatus for reverse conversion from the series data into a variable number of sequence data,
Storage means for storing a plurality of coefficient sets;
Coefficients to series data of fixed number that is the decoding by adding data to the both ends of the sequence to generate a new sequence data consisting of data of a predetermined fixed number, corresponding to each position of the data of the predetermined number of A plurality of coefficients calculated by selecting a set from the storage unit, multiplying each of a plurality of coefficients included in the selected coefficient set by the new series data associated with each coefficient, and multiplying the plurality of coefficients. Means for obtaining intermediate output data by adding values ;
Means for interpolating the intermediate output data to obtain a required variable number of series data.