JP2000122679A

JP2000122679A - Audio range expanding method and device, and speech synthesizing method and device

Info

Publication number: JP2000122679A
Application number: JP10294010A
Authority: JP
Inventors: Shiro Omori; 士郎大森; Masayuki Nishiguchi; 正之西口
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 1998-10-15
Filing date: 1998-10-15
Publication date: 2000-04-28

Abstract

PROBLEM TO BE SOLVED: To reduce the number of wide band formants to lay stress on rough structure of spectra, to enhance the quality of a provided wide audio range voice thereby, and to save a memory capacity and an operational amount required for code book search. SOLUTION: In this audio range expanding device, a linear prediction coefficient α is converted into an auto-correlation (r) in a parameter converting part 7, a wide range auto-correlation rw is generated from the auto-correlation (r) in a vector quantizing part 8 and a vector dequantizing part 9, the auto- correlation rw is converted reversely into a wide range linear prediction coefficient αw in a parameter reversely converting part 10, and a wide range voice is synthesized from the prediction coefficient αw in an LPC synthesizing part 11.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、例えば通信や放送
によって伝えられたり、メディアに蓄えられたりした周
波数帯域の狭い音声信号又はそれを構成するパラメータ
を、送信、伝送路、若しくはメディア内ではそのままに
して送信或いは記録し、受信側若しくは再生側で広帯域
音声信号を推測するような場合に用いられ、特に帯域拡
張機能を持つ携帯電話端末などに好適な、音声帯域拡張
方法及び装置、音声合成方法及び装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a method for transmitting an audio signal having a narrow frequency band or a parameter constituting the audio signal transmitted or stored in a medium, for example, in a transmission, a transmission path, or a medium. A voice band extending method and apparatus, and a voice synthesizing method, which are used in a case where a broadband voice signal is estimated on a receiving side or a reproducing side, and particularly suitable for a mobile phone terminal having a band extending function, etc. And an apparatus.

【０００２】[0002]

【従来の技術】従来より帯域拡張技術としては、使用周
波数帯域が３００Ｈｚ〜３４００Ｈｚとなされている我
が国のＰＤＣ（Personal Digital Cellular）方式によ
る自動車電話や携帯電話の音声コーデック方式であるＶ
ＳＥＬＰ（Vector Sum ExcitedLinear Prediction）方
式やＰＳＩ−ＣＥＬＰ（Pitch Synchronous Innovation
−Code Excited Linear Prediction）方式の音声信号か
ら、受信側において帯域外の信号成分を推定し、３００
Ｈｚ〜６０００Ｈｚ程度に広帯域化するような技術が存
在している。2. Description of the Related Art Conventionally, as a band extension technology, a voice codec system of a mobile phone or a mobile phone based on a PDC (Personal Digital Cellular) system in Japan, which uses a frequency band of 300 Hz to 3400 Hz, is used.
SELP (Vector Sum Excited Linear Prediction) and PSI-CELP (Pitch Synchronous Innovation)
-Estimating out-of-band signal components on the receiving side from audio signals of the Code Excited Linear Prediction) method,
There is a technique for widening the band to about Hz to 6000 Hz.

【０００３】この技術では、伝送されるパラメータから
広帯域パラメータを推定し、得られた広帯域パラメータ
により広帯域音声を合成する。また、原音声信号に含ま
れる３００Ｈｚ〜３４００Ｈｚの信号成分は、合成され
た音声信号に含まれる３００Ｈｚ〜３４００Ｈｚの帯域
成分よりも歪みが少ないものであるため、さらに音質を
向上させることを目的として、例えばフィルタ等により
合成音信号から当該３００Ｈｚ〜３４００Ｈｚの帯域成
分を除去し、この帯域除去後の合成音信号に原音声信号
の３００Ｈｚ〜３４００Ｈｚの帯域成分を加算するよう
なことも行われている。In this technique, a wideband parameter is estimated from a transmitted parameter, and a wideband speech is synthesized using the obtained wideband parameter. Further, since the signal component of 300 Hz to 3400 Hz included in the original audio signal has less distortion than the band component of 300 Hz to 3400 Hz included in the synthesized audio signal, for the purpose of further improving the sound quality, For example, the band component of 300 Hz to 3400 Hz is removed from the synthesized sound signal by a filter or the like, and the band component of 300 Hz to 3400 Hz of the original audio signal is added to the synthesized sound signal after the band removal.

【０００４】ところで、伝送される狭帯域パラメータに
は、線形予測係数（α）、若しくは反射係数（ｋ）、或
いは線スペクトル対（ＬＳＰ）といったものがある。こ
れらは、音声のスペクトル包絡を表すもので、その係数
の次数（個数）がスペクトルの山となる部分に対応す
る。ＰＤＣ方式では、３４００Ｈｚ程度までの人間の声
にはフォルマントが５つ程度あるという性質から、その
係数は１０次までが伝送されている。The transmitted narrow band parameters include a linear prediction coefficient (α), a reflection coefficient (k), and a line spectrum pair (LSP). These represent the spectral envelope of the voice, and the order (number) of the coefficients corresponds to the peak of the spectrum. In the PDC system, the coefficient is transmitted up to the tenth order because of the nature that human voices up to about 3400 Hz have about five formants.

【０００５】また、広帯域フォルマントを表す広帯域パ
ラメータの予測には様々な方法が考えられるが、その１
つにベクトル量子化を用いた方法がある。この方法は、
予め学習させることによって広帯域パラメータの次数次
元のベクトルを複数作っておき、狭帯域パラメータが入
力されると、その中から、適切な広帯域ベクトルを選び
出すことによって広帯域パラメータを得るというもので
ある。There are various methods for predicting a wideband parameter representing a wideband formant.
One is a method using vector quantization. This method
A plurality of order dimension vectors of the wideband parameters are prepared by learning in advance, and when the narrowband parameters are input, an appropriate wideband vector is selected from the input to obtain the wideband parameters.

【０００６】[0006]

【発明が解決しようとする課題】伝送される狭帯域パラ
メータから広帯域音声を推定、合成する際、フォルマン
ト数は当然ながら狭帯域での数である５つよりも増える
ことになる。When estimating and synthesizing a wideband speech from a transmitted narrowband parameter, the number of formants naturally exceeds the narrowband number of five.

【０００７】ところが、広帯域フォルマントを表すパラ
メータの次数が多ければ多いほど、上記ベクトル量子化
の際の、ベクトルの次元が大きくなり、これにより先ず
メモリ容量が大きくなってしまい、コードブックサーチ
に要する処理量も増大してしまう。However, as the order of the parameter representing the wideband formant increases, the dimension of the vector at the time of the above-described vector quantization increases, thereby first increasing the memory capacity, and processing required for the codebook search. The amount also increases.

【０００８】また、フォルマント数が多くなるというこ
とは、すなわちスペクトル包絡の微細な構造までを比較
するということになり、本来重要であった大まかな広帯
域スペクトル包絡を推定するという目的から遠くなり、
メリットは少ない。In addition, the fact that the number of formants increases means that comparison is made up to the fine structure of the spectral envelope, which is far from the purpose of estimating the rough broadband spectral envelope which was originally important.
There are few benefits.

【０００９】そこで、本発明はこのような状況に鑑みて
なされたものであり、広帯域フォルマント数を少なくし
て、スペクトルの大まかな構造を重視し、得られる広帯
域音声の品質を向上させることを可能とし、また、メモ
リ容量、コードブックサーチに要する演算量を節約する
ことを可能とする、音声帯域拡張方法及び装置、音声合
成方法及び装置を提供することを目的とする。Accordingly, the present invention has been made in view of such a situation, and it is possible to reduce the number of broadband formants, attach importance to the rough structure of the spectrum, and improve the quality of the obtained wideband speech. It is another object of the present invention to provide a voice band expansion method and apparatus, and a voice synthesis method and apparatus, which can reduce the memory capacity and the amount of calculation required for codebook search.

【００１０】[0010]

【課題を解決するための手段】本発明の音声帯域拡張方
法は、入力される狭帯域音声信号から狭帯域フォルマン
トを表現し得るパラメータを得るパラメータ抽出ステッ
プと、得られた狭帯域フォルマント数よりも多くない数
の広帯域フォルマントを表現し得るパラメータを予測す
るステップと、得られた広帯域フォルマントを表現する
パラメータから広帯域音声を合成するステップとを有す
ることにより、上述した課題を解決する。SUMMARY OF THE INVENTION A voice band extending method according to the present invention includes a parameter extracting step of obtaining a parameter capable of expressing a narrow band formant from an input narrow band voice signal, and a method of obtaining a parameter which is smaller than the obtained number of narrow band formants. The object is achieved by estimating parameters that can represent not a large number of wideband formants and synthesizing wideband speech from the parameters that represent the obtained wideband formants.

【００１１】本発明の音声帯域拡張装置は、入力される
狭帯域音声信号から狭帯域フォルマントを表現し得るパ
ラメータを得るパラメータ抽出手段と、得られた狭帯域
フォルマント数よりも多くない数の広帯域フォルマント
を表現し得るパラメータを予測するパラメータ予測手段
と、得られた広帯域フォルマントを表現するパラメータ
から広帯域音声を合成する合成手段とを有することによ
り、上述した課題を解決する。The voice band extending apparatus according to the present invention comprises a parameter extracting means for obtaining a parameter capable of expressing a narrow band formant from an input narrow band voice signal, and a wide band formant having a number not larger than the number of obtained narrow band formants. The above-mentioned problem is solved by having parameter prediction means for predicting a parameter capable of expressing a wideband formant and synthesis means for synthesizing a wideband speech from the obtained parameter representing a wideband formant.

【００１２】本発明の音声合成方法は、入力される狭帯
域音声を表現する狭帯域パラメータのうち、狭帯域フォ
ルマント情報を表現し得るパラメータから、狭帯域フォ
ルマント数よりも多くない数の広帯域フォルマントを表
現し得るパラメータを予測する第１のパラメータ予測ス
テップと、入力される狭帯域音声から狭帯域フォルマン
ト情報を表現し得るパラメータを得るパラメータ抽出ス
テップと、得られた狭帯域フォルマント数よりも多くな
い数の広帯域フォルマントを表現し得るパラメータを予
測する第２のパラメータ予測ステップと、得られた広帯
域フォルマントを表現するパラメータから広帯域音声を
合成する合成ステップとを有することにより、上述した
課題を解決する。According to the speech synthesis method of the present invention, a number of broadband formants that are not more than the number of narrowband formants are selected from parameters that can represent narrowband formant information among narrowband parameters representing input narrowband speech. A first parameter prediction step of predicting a parameter that can be expressed, a parameter extraction step of obtaining a parameter that can express narrowband formant information from the input narrowband speech, and a number not more than the obtained number of narrowband formants The above-described problem is solved by having a second parameter prediction step of predicting a parameter capable of expressing the wideband formant of the above and a synthesis step of synthesizing a wideband speech from the obtained parameter expressing the wideband formant.

【００１３】本発明の音声合成装置は、入力される狭帯
域音声を表現する狭帯域パラメータのうち、狭帯域フォ
ルマント情報を表現し得るパラメータから、狭帯域フォ
ルマント数よりも多くない数の広帯域フォルマントを表
現し得るパラメータを予測する第１のパラメータ予測手
段と、入力される狭帯域音声から狭帯域フォルマント情
報を表現し得るパラメータを得るパラメータ抽出手段
と、得られた狭帯域フォルマント数よりも多くない数の
広帯域フォルマントを表現し得るパラメータを予測する
第２のパラメータ予測手段と、得られた広帯域フォルマ
ントを表現するパラメータから広帯域音声を合成する合
成手段とを有することにより、上述した課題を解決す
る。The speech synthesizing apparatus according to the present invention, from among the narrow-band parameters representing the narrow-band speech to be input, parameters that can represent narrow-band formant information, extracts a number of wide-band formants not greater than the number of narrow-band formants. First parameter prediction means for predicting a parameter that can be expressed, parameter extraction means for obtaining a parameter capable of expressing narrowband formant information from the input narrowband speech, and a number not more than the obtained number of narrowband formants The above-mentioned problem is solved by having second parameter prediction means for predicting a parameter capable of expressing the wideband formant and synthesis means for synthesizing a wideband speech from the obtained parameter representing the wideband formant.

【００１４】すなわち、本発明によれば、本来推定すべ
きフォルマント数は狭帯域でのフォルマント数より多く
なければならないが、あえてこれを多くしない、若しく
はあえて狭帯域よりも少なくすることによって、スペク
トルの大まかな構造のみを推定するようにしている。That is, according to the present invention, the number of formants to be originally estimated has to be larger than the number of formants in the narrow band. Only rough structure is estimated.

【００１５】[0015]

【発明の実施の形態】本発明の好ましい実施の形態につ
いて、図面を参照しながら説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Preferred embodiments of the present invention will be described with reference to the drawings.

【００１６】本発明の音声帯域拡張方法及び装置、音声
合成方法及び装置の一実施の形態として、ＰＤＣコーデ
ック方式であるＶＳＥＬＰ方式、ＰＳＩ−ＣＥＬＰ方式
を適用した例を述べる。As an embodiment of the voice band extension method and apparatus and voice synthesis method and apparatus of the present invention, an example will be described in which a VSELP system and a PSI-CELP system, which are PDC codec systems, are applied.

【００１７】本実施の形態では、狭帯域パラメータから
広帯域パラメータを推定し、広帯域ＬＰＣ合成を行い、
その後、この合成音声信号に対して、原音声の周波数帯
域である低域側を原音声信号に置換する。すなわち、本
実施の形態では、合成音声信号に対して高域通過フィル
タリングを施し高域のみを残し、この高域成分の中でも
高い周波数成分を抑圧し、さらにゲインを調整し、原音
声信号に加算するような処理を行う。In this embodiment, a wideband parameter is estimated from a narrowband parameter, and a wideband LPC synthesis is performed.
After that, the low-frequency side of the frequency band of the original audio is replaced with the original audio signal. That is, in the present embodiment, high-pass filtering is performed on the synthesized voice signal to leave only the high frequency band, high frequency components among the high frequency components are suppressed, the gain is further adjusted, and added to the original voice signal. Is performed.

【００１８】ここで、広帯域パラメータの推定には、ス
ペクトル包絡すなわちフォルマント情報を表すパラメー
タである線形予測係数αの広帯域化、及び励振源の広帯
域化の２つが必要である。また、線形予測係数αの広帯
域化には、当該線形予測係数αと相互に変換可能なパラ
メータである自己相関ｒによるコードブックを予め作成
しておく必要がある。このコードブックによる量子化、
逆量子化によって自己相関ｒが広帯域化される。Here, the estimation of the wideband parameter requires two steps: a broadening of the linear prediction coefficient α, which is a parameter representing the spectrum envelope, that is, formant information, and a widening of the excitation source. Further, in order to increase the bandwidth of the linear prediction coefficient α, it is necessary to create a codebook based on the autocorrelation r which is a parameter that can be mutually converted with the linear prediction coefficient α. Quantization by this codebook,
The autocorrelation r is broadened by inverse quantization.

【００１９】以下、図１及び図２の両方を参照しなが
ら、線形予測係数αの拡張、励振源の拡張、広帯域ＬＰ
Ｃ合成、低域の置換という処理の流れに沿って説明し、
その後コードブックの作成法を述べる。図１にはＰＳＩ
−ＣＥＬＰ方式を適用した場合の本実施の形態のブロッ
ク図を示し、図２にはＶＳＥＬＰ方式を適用した場合の
本実施の形態のブロック図を示す。なお、これら図１及
び図２において、同様の処理を行う部分には同一の指示
符号を付している。Hereinafter, referring to both FIG. 1 and FIG. 2, the extension of the linear prediction coefficient α, the extension of the excitation source, and the broadband LP
We will explain along the flow of processing of C synthesis and low-frequency replacement,
Then, how to create a codebook is described. Figure 1 shows the PSI
FIG. 2 shows a block diagram of the present embodiment when the CELP scheme is applied, and FIG. 2 shows a block diagram of the present embodiment when the VSELP scheme is applied. In FIGS. 1 and 2, parts performing the same processing are denoted by the same reference numerals.

【００２０】先ず、線形予測係数αの拡張から説明す
る。First, the expansion of the linear prediction coefficient α will be described.

【００２１】本実施の形態では、線形予測係数αがスペ
クトル包絡を表すフィルタ係数であることに着目し、パ
ラメータ変換部７により高域側を推定し易い別のスペク
トル包絡を表すパラメータである自己相関ｒに一旦変換
し、その後これを広帯域化し、さらにその後にパラメー
タ逆変換部１０により当該広帯域自己相関ｒ_Wから広帯
域線形予測係数α_Wに逆変換する。In the present embodiment, attention is paid to the fact that the linear prediction coefficient α is a filter coefficient representing a spectrum envelope, and the autocorrelation parameter representing another spectrum envelope which makes it easy to estimate the high-frequency side by the parameter converter 7. once converted to r, then it is broadband, further followed to inverse transformation by the parameter inverse conversion unit 10 from the wide-band autocorrelation r _W in wideband linear prediction coefficient alpha _W.

【００２２】またこのときの自己相関ｒの拡張（広帯域
化）にはベクトル量子化を用いる。すなわち、狭帯域自
己相関ｒ_Nをベクトル量子化部８にてベクトル量子化
し、そのインデックスをベクトル逆量子化部９にてベク
トル逆量子化することで、当該インデックスから対応す
る広帯域自己相関ｒ_Wを求めるようにする。At this time, vector quantization is used to extend the autocorrelation r (to increase the bandwidth). That is, the narrow band autocorrelation r _N and the vector quantization by the vector quantization unit 8, by vector inverse quantization of the index at vector inverse quantization section 9, the wide band autocorrelation r _W corresponding from the index Ask for it.

【００２３】狭帯域自己相関と広帯域自己相関には、後
述のように一定の関係が成り立つため、広帯域自己相関
によるコードブックのみを用意すればよく、狭帯域自己
相関をこれによりベクトル量子化でき、またベクトル逆
量子化により広帯域自己相関が求められる。Since a certain relationship is established between the narrowband autocorrelation and the wideband autocorrelation as described later, only a codebook based on the wideband autocorrelation needs to be prepared, and the narrowband autocorrelation can be vector-quantized by this. Wideband autocorrelation is obtained by vector inverse quantization.

【００２４】ここで、狭帯域信号を、広帯域信号を帯域
制限したものとすれば、広帯域自己相関と狭帯域自己相
関には以下の関係がある。Here, assuming that the narrow-band signal is obtained by band-limiting the wide-band signal, the wide-band autocorrelation and the narrow-band autocorrelation have the following relationship.

【００２５】[0025]

【数１】 (Equation 1)

【００２６】ここで、φは自己相関、ｘ_nは狭帯域信
号、ｘ_wは広帯域信号、ｈは帯域制限フィルタのインパ
ルス応答である。Here, φ is the autocorrelation, _xn is the narrowband signal, _xw is the wideband signal, and h is the impulse response of the band-limiting filter.

【００２７】さらに、自己相関とパワースペクトルの関
係から、以下の式が得られる。Further, the following equation is obtained from the relationship between the autocorrelation and the power spectrum.

【００２８】[0028]

【数２】 (Equation 2)

【００２９】この帯域制限フィルタのパワー特性と等し
い周波数特性を持つ、もう１つの帯域制限フィルタを考
え、これをＨ’とすれば、次式が得られる。If another band-limiting filter having a frequency characteristic equal to the power characteristic of this band-limiting filter is considered and this is H ′, the following equation is obtained.

【００３０】[0030]

【数３】 (Equation 3)

【００３１】この新たなフィルタの通過域、阻止域は当
初の帯域制限フィルタと同等であり、減衰特性が２乗と
なる。したがって、この新たなフィルタもまた、帯域制
限フィルタといえる。The pass band and the stop band of this new filter are the same as those of the original band limiting filter, and the attenuation characteristic is squared. Therefore, this new filter can also be said to be a band limiting filter.

【００３２】これを考慮すると、狭帯域自己相関は、広
帯域自己相関と帯域制限フィルタのインパルス応答との
畳み込み、すなわち広帯域自己相関を帯域制限したもの
と単純化される。つまり、以下の式が得られる。Taking this into account, the narrow-band autocorrelation is simplified to the convolution of the wideband autocorrelation with the impulse response of the band-limiting filter, that is, the band-limited version of the wideband autocorrelation. That is, the following equation is obtained.

【００３３】[0033]

【数４】 (Equation 4)

【００３４】以上より、狭帯域自己相関をベクトル量子
化するにあったっては、広帯域コードブックのみを用意
すれば、量子化時に必要な狭帯域ベクトルは演算により
作成が可能であり、狭帯域自己相関から予めコードブッ
クを用意しておく必要がない。As described above, in the vector quantization of the narrowband autocorrelation, if only a wideband codebook is prepared, the narrowband vector required at the time of quantization can be created by calculation. It is not necessary to prepare a code book in advance.

【００３５】さらに、各広帯域自己相関ｒ_Wコードベク
トルは単調減少若しくはなだらかに増減するカーブを持
つため、帯域制限フィルタＨ’により低域通過させても
大きな変化がなく、狭帯域自己相関ｒ_N量子化は、直接
広帯域自己相関ｒ_Wコードブックでも行える。但し、サ
ンプリング周波数が１／２のため、１次おき部４にてｒ
_Wコードベクタを１次おきしたものとｒ_Nとで比較する必
要がある。Further, since each broadband autocorrelation r _W code vector has a curve that monotonically decreases or gradually increases and decreases, there is no significant change even when the bandpass filter H ′ passes the low band, and the narrowband autocorrelation r _N quantum vector does not change. The conversion can also be performed directly with the wideband autocorrelation r _W codebook. However, since the sampling frequency is 1/2, r
_It is necessary to compare the _W code vector with the first order and r _N.

【００３６】さてここで、自己相関パラメータは、ＰＤ
Ｃであれば狭帯域で１０次まで得られるが、低次のもの
ほど大まかなスペクトル包絡を表現し、高次のものほど
微細な構造を表現するという性質がある。したがって、
サンプリング周波数を上げた広帯域音声では、２０次程
度まで自己相関が当然要求されるが、本実施の形態で
は、大まかなスペクトル包絡を重視し、また演算量やメ
モリ容量を節約するため、例えば６次程度までしか求め
ないことにする。したがって、広帯域コードブックの次
元は、この場合であれば６次元とする。Here, the autocorrelation parameter is PD
In the case of C, it is possible to obtain up to the 10th order in a narrow band, but there is a property that the lower order one expresses a rough spectral envelope, and the higher order one expresses a finer structure. Therefore,
Autocorrelation is naturally required up to about the 20th order in wideband speech with increased sampling frequency. In the present embodiment, however, in order to emphasize rough spectral envelope and to reduce the amount of calculation and memory capacity, for example, the 6th order is used. I will only ask to the extent. Therefore, the dimension of the wideband codebook is six in this case.

【００３７】線形予測係数αの拡張は、有声音（Ｖ）と
無声音（ＵＶ）に分けることによって、さらに精度良い
拡張が可能であるため、本実施の形態ではこれも行って
いる。すなわち、本実施の形態では、デコード音声信号
をＶ／ＵＶ判別部２にて判別し、この判別結果に応じて
処理を行っている。これに伴い、ベクトル量子化部８及
びベクトル逆量子化部９でのコードブックも有声音
（Ｖ）用と無声音（ＵＶ）用の２つを使用している。The linear prediction coefficient α can be extended into voiced sound (V) and unvoiced sound (UV) more accurately by dividing it into voiced (V) and unvoiced (UV) sounds. That is, in the present embodiment, the V / UV discriminating unit 2 discriminates the decoded audio signal, and performs processing according to the discrimination result. Along with this, two codebooks for the voiced sound (V) and unvoiced sound (UV) are also used in the codebooks in the vector quantization unit 8 and the vector inverse quantization unit 9.

【００３８】次に、励振源の拡張について説明する。Next, expansion of the excitation source will be described.

【００３９】図１にて用いるＰＳＩ−ＣＥＬＰ方式にお
いては、狭帯域での励振源を、ゼロ詰め部２０にてゼロ
値を挿入することでアップサンプルし、エリアシング歪
みを発生させたものを用いる。この方法は、非常に単純
であるが、元音声のパワーや調波構造の差分が保存され
るので、励振源としては充分な品質であると言える。In the PSI-CELP method used in FIG. 1, the excitation source in a narrow band is upsampled by inserting a zero value in a zero padding section 20 to generate aliasing distortion. . Although this method is very simple, it can be said that the quality is sufficient as an excitation source because the difference between the power and the harmonic structure of the original voice is preserved.

【００４０】しかしながら、図２にて用いるＶＳＥＬＰ
方式では元々の音声の母音に濁りがある。これに上記励
振源にゼロ値を挿入する方法をそのまま適用すると、高
域に耳障りなノイズが残る。これを改善するため、図２
に示す本実施の形態では以下の処理を行うようにしてい
る。However, the VSELP used in FIG.
In the method, the vowels of the original voice are cloudy. If the method of inserting a zero value into the excitation source is applied as it is, unpleasant noise remains in the high frequency range. To improve this, FIG.
In the present embodiment, the following processing is performed.

【００４１】図２のＶＳＥＬＰ方式の励振源は、コーデ
ックに利用されるパラメータβ（長期予測係数）、ｂＬ
[i]（長期フィルタ状態）、γｌ（利得）、ｃｌ[i]（励
起コードベクトル）により、β＊ｂＬ[i]＋γｌ＊ｃｌ
[i]として作成されるが、このうち前者がピッチ成分、
後者がノイズ成分を表すので、図２の例では、これをβ
＊ｂＬ[i]（第１励振源Ｅ１）とγｌ＊ｃｌ[i]（第２励
振源Ｅ２）に分け、フレームエネルギ比較部２２にて１
サブフレーム毎にこれらのエネルギを比較し、前者（第
１励振源Ｅ１）のエネルギが大きい場合には、ピッチ成
分のみを重視し、励振源をパルス列とし、ピッチ成分検
出部２３にて第１励振源Ｅ１のサンプル値が一定値以上
あるか否か、すなわちピッチ成分があるか否かを検出
し、ピッチ成分がある部分では第１励振源Ｅ１のサンプ
ル値を用い、ピッチ成分のない部分では０に抑圧する。
また、フレームエネルギ比較部２２にて、第１励振源Ｅ
１のエネルギが第２励振源Ｅ２のエネルギより大きくな
い場合には、従来通り、すなわち第１励振源Ｅ１と第２
励振源Ｅ２を加えたものを用いる。このようにして作成
された狭帯域励振源にＰＳＩ−ＣＥＬＰ方式と同様にゼ
ロ詰め部２１にてゼロ値を詰めることにより、広帯域励
振源を生成する。この処理をＣライクに書けば、以下の
ようになる。The excitation source of the VSELP method shown in FIG. 2 includes parameters β (long-term prediction coefficients) and bL used for the codec.
[i] (long-term filter state), γl (gain), cl [i] (excitation code vector), β * bL [i] + γl * cl
[i], of which the former is the pitch component,
Since the latter represents a noise component, in the example of FIG.
* BL [i] (first excitation source E1) and γl * cl [i] (second excitation source E2).
These energies are compared for each sub-frame. If the energy of the former (first excitation source E1) is large, only the pitch component is emphasized and the excitation source is set to a pulse train. It is detected whether the sample value of the source E1 is equal to or more than a predetermined value, that is, whether there is a pitch component. The sample value of the first excitation source E1 is used in a portion having a pitch component, and 0 in a portion having no pitch component. To suppress.
Further, the first excitation source E
If the energy of the first excitation source E1 is not greater than the energy of the second excitation source E2,
The one to which the excitation source E2 is added is used. The narrow band excitation source thus created is padded with zero values by the zero padding unit 21 in the same manner as in the PSI-CELP method, thereby generating a wide band excitation source. If this processing is written in C-like, it becomes as follows.

【００４２】[0042]

【数５】 (Equation 5)

【００４３】次に、広帯域ＬＰＣ合成として、ＬＰＣ合
成部１１では、以上で得られた広帯域線形予測係数αと
広帯域励振源によりＬＰＣ合成を行う。Next, as wideband LPC synthesis, the LPC synthesis section 11 performs LPC synthesis using the wideband linear prediction coefficient α obtained above and a wideband excitation source.

【００４４】次に低域の置換について説明する。Next, low-frequency replacement will be described.

【００４５】ＬＰＣ合成部１１にて広帯域ＬＰＣ合成さ
れた音声は、フォルマント個数を少なくしたことをはじ
めとし、推定誤差が伴い、このままでは品質が悪いの
で、本実施の形態では、低域側はコーデック出力のオリ
ジナル音声信号ＳＮＤ_Nで置換するようにしている。こ
のため、本実施の形態では、ＬＰＣ合成部１１からの合
成音信号のうち４ｋＨｚ以上を、狭帯域周波数範囲除去
部１２にて抽出し、一方で、アップサンプル部５におい
てコーデック出力をｆｓ＝１６ｋＨｚにアップサンプル
し、これらを加算部１７にて加算する。The speech that has been subjected to wideband LPC synthesis by the LPC synthesis unit 11 is accompanied by estimation errors, including a reduction in the number of formants, and is of poor quality if left unaltered. It is to be replaced with the original audio signal SND _N outputs. For this reason, in the present embodiment, 4 kHz or more of the synthesized sound signal from the LPC synthesis unit 11 is extracted by the narrowband frequency range elimination unit 12, while the codec output is fs = 16 kHz by the up-sampling unit 5. And these are added by the adder 17.

【００４６】このとき、本実施の形態では、好みに応じ
て、高域側ゲインを調整可能としている。すなわち、音
質はユーザ毎の個人差が大きいため、この高域側ゲイン
の値を可変にすることは重要である。このため本実施の
形態では、高域側ゲインの値をユーザからの入力により
予め設定しておき、このゲイン値を参照し、乗算部１６
にて当該ゲイン値を乗算することで、高域側ゲインを調
整するようにしている。At this time, in the present embodiment, the high-frequency gain can be adjusted as desired. That is, since the sound quality greatly varies among users, it is important to make the value of the high-frequency gain variable. Therefore, in the present embodiment, the value of the high-frequency side gain is set in advance by an input from the user, and the multiplication unit 16 is referred to by referring to this gain value.
By multiplying the gain value, the high-frequency gain is adjusted.

【００４７】また、本実施の形態では、高域抑圧フィル
タ部１３により、加算部１７での加算前に高域側の信号
に対して約６ｋＨｚ以上の成分を若干抑圧するフィルタ
リング処理を施すことで、聴き易い音にしている。この
フィルタ係数は選択可能であり、予め選択されたフィル
タ係数によりフィルタリング処理を行うことで、好みに
応じた高域側の周波数帯域を選択可能となる。このフィ
ルタの選択もユーザの入力により設定する。In the present embodiment, the high-band suppression filter unit 13 performs a filtering process for slightly suppressing components of about 6 kHz or more on the high-frequency side signal before addition in the addition unit 17. The sound is easy to hear. This filter coefficient is selectable, and by performing a filtering process using a filter coefficient selected in advance, it becomes possible to select a high-frequency band according to preference. The selection of this filter is also set by the user's input.

【００４８】但し、この高域抑圧フィルタ部１３による
高域抑圧フィルタ処理は、低域側のパワー特性に影響を
与えないため、加算部１７での加算後に行うようにして
も良い。或いは、あえて低域側にも影響のあるフィルタ
リング処理を加算部１７での加算後に施すことも可能で
ある。However, the high band suppression filter processing by the high band suppression filter unit 13 may be performed after the addition by the addition unit 17 because it does not affect the power characteristics on the low band side. Alternatively, it is also possible to apply a filtering process that also affects the low-frequency side after the addition by the adding unit 17.

【００４９】以上により、広帯域音声が得られる。As described above, a wideband voice can be obtained.

【００５０】次に、コードブックの作成法について説明
する。Next, a method of creating a code book will be described.

【００５１】本実施の形態では、以上の帯域拡張処理に
先だって、コードブックを作成しておくようにしてい
る。図３にはコードブックのトレーニングデータ生成の
ためのブロック図を示し、図４にはコードブック生成の
ためのブロック図を示す。In this embodiment, a code book is created before the above-described band expansion processing. FIG. 3 shows a block diagram for generating codebook training data, and FIG. 4 shows a block diagram for generating a codebook.

【００５２】コードブックの作成法は、一般によく知ら
れた一般化ロイドアルゴリズム（ＧＬＡ）による方法を
使用している。The code book is created using a well-known generalized Lloyd algorithm (GLA).

【００５３】図３において、本実施の形態では、先ずフ
レーミング部３１により、広帯域音声信号を一定時間、
例えば２０ｍｓ毎のフレームに区切り、さらにＶ／ＵＶ
分類部３２にて有声音（Ｖ）と無声音（ＵＶ）に分類
し、これら有声音（Ｖ）と無声音（ＵＶ）の各フレーム
毎に、それぞれ自己相関計算部３３，３４にて一定次、
例えば６次までの自己相関を求めておく。この有声音
（Ｖ）と無声音（ＵＶ）の各フレーム毎の自己相関ｒを
トレーニングデータとする。In FIG. 3, in the present embodiment, first, the framing section 31 converts a wideband audio signal into a signal for a predetermined time.
For example, it is divided into frames every 20 ms, and V / UV
The classification unit 32 classifies the voiced sound (V) and the unvoiced sound (UV), and for each frame of the voiced sound (V) and the unvoiced sound (UV), the autocorrelation calculation units 33 and 34 respectively set a fixed order,
For example, an autocorrelation up to the sixth order is obtained. The autocorrelation r for each frame of the voiced sound (V) and the unvoiced sound (UV) is used as training data.

【００５４】次に、図４において、本実施の形態では、
広帯域パラメータ抽出部４１により、それら有声音
（Ｖ）と無声音（ＵＶ）の各フレーム毎の自己相関ｒか
ら広帯域パラメータを抽出し、さらに、コードブック学
習部４２にて、６次元のコードブックを作成する。Next, referring to FIG. 4, in the present embodiment,
A wideband parameter extracting unit 41 extracts a wideband parameter from the autocorrelation r of each of the voiced sound (V) and unvoiced sound (UV) for each frame, and a codebook learning unit 42 creates a six-dimensional codebook. I do.

【００５５】上述のように、有声音、無声音の区別を行
い、有声音の自己相関、無声音の自己相関を別々に集
め、それぞれのコードブックを作成した場合、帯域拡張
処理中αの拡張時にコードブックを参照するが、このと
きにも有声音、無声音の判別を行い、対応するコードブ
ックを利用する。As described above, the voiced sound and the unvoiced sound are distinguished from each other, and the autocorrelation of the voiced sound and the autocorrelation of the unvoiced sound are separately collected. Reference is made to a book. At this time, a voiced sound or an unvoiced sound is determined, and a corresponding code book is used.

【００５６】なお、有声音と無声音を区別せずにコード
ブックを作成するようにしてもよい。It should be noted that the codebook may be created without distinguishing voiced and unvoiced sounds.

【００５７】以上説明したように、本実施の形態によれ
ば、広帯域フォルマント数を少なくすることによって、
スペクトルの大まかな構造が重視され、得られる広帯域
音声の品質が向上した。また、メモリ容量、コードブッ
クサーチに要する演算量が節約できた。As described above, according to the present embodiment, by reducing the number of broadband formants,
Emphasis was placed on the rough structure of the spectrum, and the quality of the resulting wideband speech was improved. Further, the memory capacity and the amount of calculation required for the codebook search can be saved.

【００５８】なお、フォルマントを表すことのできるパ
ラメータは、本実施の形態に挙げた線形予測係数αや自
己相関ｒに限定されず、例えば線スペクトル対（ＬＳ
Ｐ）や部分自己相関係数（ＰＡＲＣＯＲ係数）など様々
なものを使用できる。また、本発明は低域から高域を予
測するものだけに限定するものではなく、ＰＤＣ方式に
限るものでもない。さらに、伝送されるのはアナログ音
声であるような場合でも、その後ディジタル信号処理す
るものであれば本発明はそのまま利用できるため、パラ
メータ伝送されるものに限るものではない。またさら
に、本発明は、携帯端末の機能である留守番電話や応答
メッセージ、メモ録音などをはじめとし、伝送路を介さ
ないものでも全く同様に適用することが可能である。The parameters that can represent the formant are not limited to the linear prediction coefficient α and the autocorrelation r described in the present embodiment, but may be, for example, a line spectrum pair (LS
Various types such as P) and a partial autocorrelation coefficient (PARCOR coefficient) can be used. In addition, the present invention is not limited to only predicting the high band from the low band, and is not limited to the PDC method. Furthermore, even if analog voice is transmitted, the present invention can be used as long as digital signal processing is performed thereafter. Therefore, the present invention is not limited to parameter transmission. Still further, the present invention can be applied in exactly the same manner to those that do not pass through a transmission path, such as an answering machine, a response message, and memo recording, which are functions of a portable terminal.

【００５９】[0059]

【発明の効果】以上の説明で明らかなように、本発明に
おいては、狭帯域音声又は狭帯域パラメータから広帯域
音声を予測、合成する帯域拡張技術において、合成する
広帯域音声のフォルマントを少なくことにより、スペク
トルの大まかな構造を重視でき、得られる広帯域音声の
品質を向上させることが可能となり、また、メモリ容
量、コードブックサーチに要する演算量を節約すること
が可能である。As is apparent from the above description, in the present invention, in a band extension technique for predicting and synthesizing a wideband speech from a narrowband speech or a narrowband parameter, the formant of the synthesized wideband speech is reduced. The rough structure of the spectrum can be emphasized, the quality of the obtained wideband speech can be improved, and the memory capacity and the amount of calculation required for a codebook search can be reduced.

[Brief description of the drawings]

【図１】本発明にＰＳＩ−ＣＥＬＰ方式を適用した場合
の実施の形態のブロック図である。FIG. 1 is a block diagram of an embodiment when a PSI-CELP scheme is applied to the present invention.

【図２】本発明にＶＳＥＬＰ方式を適用した場合の実施
の形態のブロック図である。FIG. 2 is a block diagram of an embodiment when the VSELP method is applied to the present invention.

【図３】トレーニングデータ生成のためのブロック図で
ある。FIG. 3 is a block diagram for generating training data.

【図４】コードブック生成のためのブロック図である。FIG. 4 is a block diagram for generating a code book.

[Explanation of symbols]

２Ｖ／ＵＶ判別部、４１次おき部、５アップ
サンプル部、７パラメータ変換部、８ベクトル
量子化部、９ベクトル逆量子化部、１０パラメー
タ逆変換部、１１ＬＰＣ合成部、１２狭帯域周
波数範囲除去部、１３高域抑圧フィルタ部、１６
乗算部、１７加算部、２０，２１ゼロ詰め
部、２２フレームエネルギ比較部、２３ピッチ
成分検出部、３１フレーミング部、３２Ｖ／Ｕ
Ｖ分類部、３３，３４自己相関計算部、４１広
帯域パラメータ抽出部、４２広帯域コードブック学
習部2 V / UV discriminating unit, 4 first order unit, 5 up-sampling unit, 7 parameter conversion unit, 8 vector quantization unit, 9 vector inverse quantization unit, 10 parameter inverse conversion unit, 11 LPC synthesis unit, 12 narrow band Frequency range remover, 13 high-frequency suppression filter, 16
Multiplying unit, 17 adding unit, 20, 21 Zero padding unit, 22 Frame energy comparing unit, 23 Pitch component detecting unit, 31 Framing unit, 32 V / U
V classification unit, 33,34 autocorrelation calculation unit, 41 wideband parameter extraction unit, 42 wideband codebook learning unit

─────────────────────────────────────────────────────
────────────────────────────────────────────────── ───

【手続補正書】[Procedure amendment]

【提出日】平成１０年１２月９日（１９９８．１２．
９）[Submission date] December 9, 1998 (1998.12.
9)

【手続補正１】[Procedure amendment 1]

【補正対象書類名】明細書[Document name to be amended] Statement

【補正対象項目名】請求項５[Correction target item name] Claim 5

【補正方法】変更[Correction method] Change

【補正内容】[Correction contents]

【手続補正２】[Procedure amendment 2]

【補正対象書類名】明細書[Document name to be amended] Statement

【補正対象項目名】請求項７[Correction target item name] Claim 7

【補正方法】変更[Correction method] Change

【補正内容】[Correction contents]

【手続補正３】[Procedure amendment 3]

【補正対象書類名】明細書[Document name to be amended] Statement

【補正対象項目名】００１２[Correction target item name] 0012

【補正方法】変更[Correction method] Change

【補正内容】[Correction contents]

【００１２】本発明の音声合成方法は、入力される狭帯
域音声を表現する狭帯域パラメータのうち、狭帯域フォ
ルマント情報を表現し得るパラメータから、狭帯域フォ
ルマント数よりも多くない数の広帯域フォルマントを表
現し得るパラメータを予測するパラメータ予測ステップ
と、得られた広帯域フォルマントを表現するパラメータ
から広帯域音声を合成する合成ステップとを有すること
により、上述した課題を解決する。According to the speech synthesis method of the present invention, a number of broadband formants that are not more than the number of narrowband formants are selected from parameters that can represent narrowband formant information among narrowband parameters representing input narrowband speech. The above-described problem is solved by having a parameter prediction step of predicting a parameter that can be expressed and a synthesis step of synthesizing a wideband speech from the obtained parameter expressing the wideband formant.

【手続補正４】[Procedure amendment 4]

【補正対象書類名】明細書[Document name to be amended] Statement

【補正対象項目名】００１３[Correction target item name] 0013

【補正方法】変更[Correction method] Change

【補正内容】[Correction contents]

【００１３】本発明の音声合成装置は、入力される狭帯
域音声を表現する狭帯域パラメータのうち、狭帯域フォ
ルマント情報を表現し得るパラメータから、狭帯域フォ
ルマント数よりも多くない数の広帯域フォルマントを表
現し得るパラメータを予測するパラメータ予測手段と、
得られた広帯域フォルマントを表現するパラメータから
広帯域音声を合成する合成手段とを有することにより、
上述した課題を解決する。The speech synthesizing apparatus according to the present invention, from among the narrow-band parameters representing the narrow-band speech to be input, parameters that can represent narrow-band formant information, extracts a number of wide-band formants not greater than the number of narrow-band formants. Parameter prediction means for predicting a parameter that can be represented;
Having synthesis means for synthesizing a wideband speech from the parameters expressing the obtained wideband formant,
The above-mentioned problem is solved.

【手続補正５】[Procedure amendment 5]

【補正対象書類名】明細書[Document name to be amended] Statement

【補正対象項目名】００４２[Correction target item name] 0042

【補正方法】変更[Correction method] Change

【補正内容】[Correction contents]

【００４２】[0042]

【数５】 (Equation 5)

Claims

[Claims]

1. A parameter extracting step for obtaining a parameter capable of expressing a narrow-band formant from an input narrow-band audio signal, and predicting a parameter capable of expressing a number of wide-band formants which is not more than the number of obtained narrow-band formants. And a synthesizing step of synthesizing a wideband speech from a parameter expressing the obtained wideband formant.

2. The voice band according to claim 1, further comprising a replacement step of removing a frequency range corresponding to the narrow-band voice signal from the synthesized wide-band voice signal and replacing it with the input narrow-band voice signal. Expansion method.

3. A parameter extracting means for obtaining a parameter capable of expressing a narrow-band formant from an input narrow-band audio signal, and predicting a parameter capable of expressing a number of wide-band formants which is not more than the obtained number of narrow-band formants. And a synthesizing unit for synthesizing a wideband speech from a parameter expressing the obtained wideband formant.

4. The voice band according to claim 3, further comprising a replacement unit for removing a frequency range corresponding to the narrow band voice signal from the synthesized wide band voice signal and replacing the frequency range with the input narrow band voice signal. Expansion device.

5. A parameter that can express a number of wideband formants that is not more than the number of narrowband formants from parameters that can express narrowband formant information among narrowband parameters that represent input narrowband speech. A first parameter prediction step of predicting, a parameter extraction step of obtaining a parameter capable of expressing narrowband formant information from an input narrowband speech, and expressing a number of wideband formants not more than the obtained number of narrowband formants Second to predict possible parameters
And a synthesizing step of synthesizing a wideband speech from the parameters representing the obtained wideband formants.

6. The speech synthesis according to claim 5, further comprising a replacement step of removing a frequency range corresponding to the narrowband speech signal from the synthesized wideband speech signal and replacing the frequency range with the input narrowband speech signal. Method.

7. A parameter that can express the number of wideband formants that is not more than the number of narrowband formants from parameters that can express narrowband formant information among narrowband parameters that represent input narrowband speech. First parameter predicting means for predicting, parameter extracting means for obtaining a parameter capable of expressing narrowband formant information from the input narrowband speech, and expressing the number of wideband formants not more than the number of obtained narrowband formants Second to predict possible parameters
And a synthesizing unit for synthesizing a wideband speech from the parameters representing the obtained wideband formants.

8. A speech synthesizer according to claim 7, further comprising replacement means for removing a frequency range corresponding to a narrowband speech signal from the synthesized wideband speech signal and replacing the frequency range with an input narrowband speech signal. apparatus.