JP5325130B2 - LPC analysis device, LPC analysis method, speech analysis / synthesis device, speech analysis / synthesis method, and program - Google Patents

LPC analysis device, LPC analysis method, speech analysis / synthesis device, speech analysis / synthesis method, and program Download PDF

Info

Publication number
JP5325130B2
JP5325130B2 JP2010012963A JP2010012963A JP5325130B2 JP 5325130 B2 JP5325130 B2 JP 5325130B2 JP 2010012963 A JP2010012963 A JP 2010012963A JP 2010012963 A JP2010012963 A JP 2010012963A JP 5325130 B2 JP5325130 B2 JP 5325130B2
Authority
JP
Japan
Prior art keywords
lpc
signal
speech
analysis
pitch mark
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
JP2010012963A
Other languages
Japanese (ja)
Other versions
JP2011150232A (en
Inventor
定男 廣谷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nippon Telegraph and Telephone Corp
Original Assignee
Nippon Telegraph and Telephone Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nippon Telegraph and Telephone Corp filed Critical Nippon Telegraph and Telephone Corp
Priority to JP2010012963A priority Critical patent/JP5325130B2/en
Publication of JP2011150232A publication Critical patent/JP2011150232A/en
Application granted granted Critical
Publication of JP5325130B2 publication Critical patent/JP5325130B2/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Description

この発明は、音声信号から音源信号と声道スペクトルを抽出し、声道フィルタを音源信号で駆動して音声信号を合成する音声分析合成装置、音声分析合成方法に関し、特に基本周波数に影響されない声道スペクトルを抽出するLPC分析装置、LPC分析方法に関する。   The present invention relates to a speech analysis / synthesis device and a speech analysis / synthesis method for extracting a sound source signal and a vocal tract spectrum from a speech signal and synthesizing a speech signal by driving a vocal tract filter with the sound source signal. The present invention relates to an LPC analysis apparatus and an LPC analysis method for extracting a road spectrum.

これまで音声合成・符号化・認識技術の性能向上には、人間の音声生成メカニズムに基づき、音声信号を効率的かつ精度良く、音源信号と声道スペクトルに分解することが重要な役割を果たすとされてきた。この分解には、線形予測(LPC)分析が広く用いられているが、音源信号として白色雑音を仮定しているため、得られる声道スペクトルが少なからず基本周波数(F0)の影響を受けるという問題があった。特に、女声においては基本周波数が高く、上記仮定が満たされないことから、LPC分析により推定される声道スペクトルに音源信号の基本周波数とその倍音が含まれてしまい、正確な声道スペクトルを得ることが難しいという問題があった。   Up to now, in order to improve the performance of speech synthesis / coding / recognition technology, it is important to decompose speech signals into sound source signals and vocal tract spectrum efficiently and accurately based on human speech generation mechanism. It has been. For this decomposition, linear prediction (LPC) analysis is widely used. However, since white noise is assumed as a sound source signal, the obtained vocal tract spectrum is not a little affected by the fundamental frequency (F0). was there. In particular, since the fundamental frequency is high in female voice and the above assumption is not satisfied, the fundamental frequency of the sound source signal and its harmonics are included in the vocal tract spectrum estimated by LPC analysis, and an accurate vocal tract spectrum is obtained. There was a problem that was difficult.

ところで、LPC分析における音源信号の基本周波数の問題を回避するために、標本選択線形予測法が提案されている(例えば、非特許文献1参照)。これは、音声信号をLPC逆フィルタに通すことにより得られるLPC残差信号が大きくなる時点での音声信号は、上記白色雑音を音源信号とする仮定を満たさないことから、当該音声信号を除いてLPC分析を行うという手法である。これにより、基本周波数の影響の少ない声道スペクトルを得ることが期待されるが、音声信号に含まれる位相特性を考慮していないため、取り除くべき音声信号の選択に誤りが生じ、その結果得られる声道スペクトルの精度が不十分であるという問題がある。また、音声信号を取り除くため、分析に用いるデータ数が不足するおそれがある。   By the way, in order to avoid the problem of the fundamental frequency of the sound source signal in the LPC analysis, a sample selection linear prediction method has been proposed (for example, see Non-Patent Document 1). This is because the audio signal at the time when the LPC residual signal obtained by passing the audio signal through the LPC inverse filter becomes large does not satisfy the assumption that the white noise is a sound source signal. This is a method of performing LPC analysis. As a result, it is expected to obtain a vocal tract spectrum with little influence of the fundamental frequency, but since the phase characteristics included in the audio signal are not taken into consideration, an error occurs in the selection of the audio signal to be removed, and the result is obtained. There is a problem that the accuracy of the vocal tract spectrum is insufficient. Moreover, since the audio signal is removed, there is a possibility that the number of data used for analysis is insufficient.

一方、音声信号に含まれる位相特性の問題を解決するために、AR−HMMという方法が提案されている(例えば、非特許文献2参照)。この方法では、HMMによる音源モデリングを仮定することで、基本周波数に頑健なLPC分析を可能としているが、音源のモデルが位相特性を含めた複雑なものであるため、計算に時間がかかり、さらには安定な解を容易に求めることが難しいという問題がある。   On the other hand, a method called AR-HMM has been proposed in order to solve the problem of the phase characteristics included in the audio signal (see, for example, Non-Patent Document 2). In this method, it is possible to perform robust LPC analysis at the fundamental frequency by assuming sound source modeling by HMM. However, since the sound source model is complicated including phase characteristics, it takes time to calculate, Has a problem that it is difficult to find a stable solution easily.

三好義昭、大和一晴、柳田益造、角所収,“2段標本選択線形予測法による高ピッチ音声の分析”,電子情報通信学会論文誌,Vol.J70-A, No.8, pp.1146-1156, 1987.Yoshiaki Miyoshi, Kazuharu Yamato, Masuzou Yanagida, Kakusho, “Analysis of high-pitch speech using two-stage sample selection linear prediction method”, IEICE Transactions, Vol.J70-A, No.8, pp.1146- 1156, 1987. 佐宗晃、田中和世,“HMMによる音源のモデリングと高基本周波数に頑健な声道特性抽出”,電子情報通信学会論文誌,Vol.J84-D-II, No.9, pp.1960-1969,2001.Minoru Sasou and Seino Tanaka, “Modeling of sound source by HMM and extraction of vocal tract characteristics robust to high fundamental frequencies”, IEICE Transactions, Vol.J84-D-II, No.9, pp.1960-1969 , 2001.

この発明の目的は上述したような状況に鑑み、音声信号から基本周波数の影響を受けない正確かつ安定な声道スペクトルを効率的に得ることにある。   In view of the above situation, an object of the present invention is to efficiently obtain an accurate and stable vocal tract spectrum that is not affected by a fundamental frequency from an audio signal.

この発明によれば、LPC分析装置は位相等化音声信号とピッチマーク時刻群とを入力とし、音源信号をピッチマーク時刻群の各ピッチマーク時刻に振幅Gの単一パルスをもち、ピッチマーク時刻以外の時刻は白色雑音よりなるものとし、LPC係数と音源信号とによって得られる音声信号と、位相等化音声信号との誤差が最小となるようにLPC係数及び振幅Gを求める構成とされる。   According to this invention, the LPC analyzer receives the phase equalized audio signal and the pitch mark time group as input, and the sound source signal has a single pulse of amplitude G at each pitch mark time of the pitch mark time group, and the pitch mark time. The time other than is made up of white noise, and the LPC coefficient and the amplitude G are obtained so that the error between the audio signal obtained from the LPC coefficient and the sound source signal and the phase equalized audio signal is minimized.

この発明による音声分析合成装置は、入力された音声信号の音声区間を検出する音声区間検出部と、前記音声区間に対して前記音声信号から基本周波数を推定する基本周波数分析部と、前記基本周波数に基づき決定した窓長で前記音声信号を切り出してLPC分析を行い、LPC逆フィルタに前記音声信号を通すことによりLPC残差信号を求める第1LPC分析部と、前記基本周波数から得られる基本周期に応じたピッチ波形を生成し、そのピッチ波形と前記LPC残差信号とを用いてピッチマーク時刻群を抽出するピッチマーク分析部と、前記ピッチマーク時刻群と前記LPC残差信号とを用いて求めた位相等化フィルタを前記音声信号に施すことにより位相等化音声信号を生成する位相等化音声生成部と、音源信号を前記ピッチマーク時刻群の各ピッチマーク時刻に振幅Gの単一パルスをもち、前記ピッチマーク時刻以外の時刻は白色雑音よりなるものとし、LPC係数と前記音源信号とによって得られる音声信号と、前記位相等化音声信号との誤差が最小となるように前記LPC係数及び前記振幅Gを求める第2LPC分析部と、前記ピッチマーク時刻群と前記位相等化音声信号と前記LPC係数とを用い、パルスゲインとマルチパルス音源モデルを求めるマルチパルス音源モデル生成部と、前記第2LPC分析部におけるLPC分析の際に得られるPARCOR係数kと自己相関関数Rとを用いて白色雑音ゲインを計算する白色雑音ゲイン生成部と、前記音声区間以外では白色雑音に前記白色雑音ゲインを乗じたものを用い、前記音声区間では前記基本周波数、前記パルスゲイン及び前記マルチパルス音源モデルから計算されるマルチパルスあるいは前記基本周波数と前記パルスゲインから計算される単一パルス列を用いてなる音源信号と前記LPC係数とを畳み込み演算することにより音声信号を合成する音声合成部とよりなる。   A speech analysis / synthesis apparatus according to the present invention includes a speech section detection unit that detects a speech section of an input speech signal, a fundamental frequency analysis unit that estimates a fundamental frequency from the speech signal for the speech section, and the fundamental frequency The speech signal is cut out with the window length determined based on the LPC analysis, the LPC analysis is performed, and the speech signal is passed through the LPC inverse filter to obtain the LPC residual signal, and the fundamental period obtained from the fundamental frequency A pitch waveform analysis unit that generates a corresponding pitch waveform and extracts a pitch mark time group using the pitch waveform and the LPC residual signal, and uses the pitch mark time group and the LPC residual signal. Applying a phase equalization filter to the audio signal to generate a phase equalized audio signal; Each pitch mark time of the time group has a single pulse of amplitude G, and the time other than the pitch mark time is composed of white noise, the audio signal obtained from the LPC coefficient and the sound source signal, and the phase equalization Using the second LPC analysis unit for obtaining the LPC coefficient and the amplitude G so as to minimize the error from the audio signal, the pitch mark time group, the phase equalized audio signal, and the LPC coefficient, a pulse gain and a multi-value are obtained. A multi-pulse sound source model generation unit that obtains a pulse sound source model, a white noise gain generation unit that calculates a white noise gain using a PARCOR coefficient k and an autocorrelation function R obtained in the LPC analysis in the second LPC analysis unit; The white frequency multiplied by the white noise gain is used outside the voice interval, and the fundamental frequency and the pulse are used in the voice interval. A voice signal is synthesized by convolution calculation of a sound source signal using a multi-pulse calculated from the gain and the multi-pulse sound source model or a single pulse train calculated from the fundamental frequency and the pulse gain and the LPC coefficient. It consists of a speech synthesizer.

この発明によれば、基本周波数に影響されない正確かつ安定な声道スペクトルを効率的に得ることができる。   According to the present invention, an accurate and stable vocal tract spectrum that is not affected by the fundamental frequency can be obtained efficiently.

この発明による音声分析合成装置の一実施例の機能構成を示すブロック図。The block diagram which shows the function structure of one Example of the speech analysis synthesis apparatus by this invention. 図1に示した音声分析合成装置における処理の流れを示すフローチャート(その1)。The flowchart (the 1) which shows the flow of a process in the speech analysis and synthesis apparatus shown in FIG. 図1に示した音声分析合成装置における処理の流れを示すフローチャート(その2)。The flowchart (the 2) which shows the flow of a process in the speech analysis and synthesis apparatus shown in FIG. 図1に示した音声分析合成装置における処理の流れを示すフローチャート(その3)。The flowchart (the 3) which shows the flow of a process in the speech analysis and synthesis apparatus shown in FIG. 声道スペクトルの分析結果を示すグラフ。The graph which shows the analysis result of a vocal tract spectrum. 声道スペクトル系列の分析結果を示す図。The figure which shows the analysis result of a vocal tract spectrum series. 音声分析合成処理における各処理過程の波形例を示す図。The figure which shows the example of a waveform of each process in a speech analysis synthesis process.

この発明の実施形態を図面を参照して実施例により説明する。   Embodiments of the present invention will be described with reference to the drawings.

図1はこの発明による音声分析合成装置の一実施例の機能構成を示したものであり、この例では音声分析合成装置10は音声区間検出部11と基本周波数分析部12と第1LPC分析部13とピッチマーク分析部14と位相等化音声生成部15と第2LPC分析部16とマルチパルス音源モデル生成部17と白色雑音ゲイン生成部18と音声合成部19とによって構成されている。   FIG. 1 shows a functional configuration of an embodiment of a speech analysis / synthesis apparatus according to the present invention. In this example, a speech analysis / synthesis apparatus 10 includes a speech section detection unit 11, a fundamental frequency analysis unit 12, and a first LPC analysis unit 13. And a pitch mark analysis unit 14, a phase equalized speech generation unit 15, a second LPC analysis unit 16, a multi-pulse sound source model generation unit 17, a white noise gain generation unit 18, and a speech synthesis unit 19.

図2〜4は図1に示した音声分析合成装置10における処理の流れを示したものであり、以下、図1〜4を参照して各部の機能、処理の流れについて説明する。
<音声区間検出部>
まず、音声区間検出部11にて、音声信号(原音声)のパワーの閾値処理に基づき、音声区間の検出を行う(ステップS1)。
2 to 4 show the flow of processing in the speech analysis / synthesis apparatus 10 shown in FIG. 1, and the function of each unit and the flow of processing will be described below with reference to FIGS.
<Audio section detection unit>
First, the voice section detection unit 11 detects a voice section based on the power threshold processing of the voice signal (original voice) (step S1).

<基本周波数分析部>
次に、基本周波数分析部12にて、得られた音声区間に対して音声信号からピッチ抽出アルゴリズムを用いて基本周波数を推定する。例えば、本実施例では、30msの分析窓長(分析区間)と、4msの分析シフト長により、瞬時周波数振幅スペクトルに基づき、基本周波数を求める(ステップS2)。なお、基本周波数の分析には例えば下記文献Aに記載されている瞬時周波数振幅スペクトルに基づく手法を用いることができる。
文献A:Arifianto,D.,Tanaka,T.,Masuko,T.,and Kobayashi,T.,“Robust F0 estimation of speech signal using harmonicity measure based on instantaneous frequency”,IEICE Trans. Information and Systems,Vol.E87-D,No.12,pp.2812-2820,2004.
<Basic frequency analysis section>
Next, the fundamental frequency analysis unit 12 estimates a fundamental frequency from the speech signal using a pitch extraction algorithm for the obtained speech section. For example, in the present embodiment, the fundamental frequency is obtained based on the instantaneous frequency amplitude spectrum with an analysis window length (analysis interval) of 30 ms and an analysis shift length of 4 ms (step S2). For analysis of the fundamental frequency, for example, a technique based on the instantaneous frequency amplitude spectrum described in the following document A can be used.
Reference A: Arifianto, D., Tanaka, T., Masuko, T., and Kobayashi, T., “Robust F0 estimation of speech signal using harmonicity measure based on instantaneous frequency”, IEICE Trans. Information and Systems, Vol. E87 -D, No.12, pp.2812-2820,2004.

<第1LPC分析部>
第1LPC分析部13は、位相等化処理に用いるLPC残差信号を得るために、4msの分析シフト長で、音声信号を基本周期(基本周期=1÷基本周波数)の2.5倍を窓長としたブラックマン窓で切り出し、自己相関法によるLPC分析を行う(ステップS3)。そして、音声信号をLPC逆フィルタに通すことによりLPC残差信号を得る(ステップS4)。
<First LPC analysis unit>
In order to obtain an LPC residual signal used for phase equalization processing, the first LPC analysis unit 13 uses an analysis shift length of 4 ms and sets the voice signal to 2.5 times the fundamental period (basic period = 1 ÷ basic frequency) as the window length. The extracted Blackman window is used to perform LPC analysis by the autocorrelation method (step S3). Then, the LPC residual signal is obtained by passing the audio signal through the LPC inverse filter (step S4).

位相等化処理では、LPC残差信号のスペクトルが平坦であることが音声信号のスペクトルを変化させないための条件であるため、本実施例では、LPC残差信号のスペクトルの平坦化を目的として、LPCの分析次数を高めに設定する(例えば、男声で50次、女声で40次)。また、基本周波数の影響を避けるために、ラグ窓(100Hz)を用いる。   In the phase equalization process, the flatness of the spectrum of the LPC residual signal is a condition for preventing the spectrum of the speech signal from being changed. Therefore, in this embodiment, for the purpose of flattening the spectrum of the LPC residual signal, The analysis order of LPC is set higher (for example, 50th for male voice and 40th for female voice). In order to avoid the influence of the fundamental frequency, a lag window (100 Hz) is used.

さらに、窓関数を用いたパワースペクトルの分析は分析時刻に依存するという問題があるため、声道スペクトルの時間方向平滑化を目的として、下記文献Bに記載されているようなTANDEM窓を用いる。
文献B:森勢将雅、高橋徹、河原英紀、入野俊夫,“分析時刻に依存しない周期信号のパワースペクトル推定法を用いた音声分析”,電子情報通信学会論文誌,Vol.J92-A,No.3,pp.163-171,2009.
Furthermore, since there is a problem that the analysis of the power spectrum using the window function depends on the analysis time, a TANDEM window as described in the following document B is used for the purpose of smoothing the vocal tract spectrum in the time direction.
Reference B: Masamasa Mori, Toru Takahashi, Hidenori Kawahara, Toshio Irino, “Speech Analysis Using Power Spectrum Estimation Method for Periodic Signals Independent of Analysis Time”, IEICE Transactions, Vol. J92-A, No.3, pp.163-171, 2009.

これは、当該分析フレームと基本周期の半分シフトした分析フレームのパワースペクトルを足して2で割ることで分析時刻に依存しないパワースペクトルを推定する手法である。ウィーナー・ヒンチンの定理より、パワースペクトルの逆フーリエ変換は自己相関関数であるため、自己相関法によるLPC分析で用いる場合は、当該分析フレームと基本周期の半分シフトした分析フレームの自己相関関数を足して2で割り、得られる自己相関行列をDurbinアルゴリズムで解けばLPC係数を得ることができる。   This is a method of estimating a power spectrum that does not depend on the analysis time by adding the power spectrum of the analysis frame and the analysis frame shifted by half of the basic period and dividing by two. According to Wiener Hinting's theorem, the inverse Fourier transform of the power spectrum is an autocorrelation function, so when using it in LPC analysis by the autocorrelation method, add the autocorrelation function of the analysis frame and the analysis frame shifted by half the fundamental period. LPC coefficients can be obtained by dividing the obtained autocorrelation matrix by the Durbin algorithm.

ここで、当該分析フレームに含まれる基本周期がTで一定の場合、TANDEM窓の計算に必要なシフト量はT/2で良いことが文献Bで示されている。しかし、実際の音声信号では、分析フレーム内での基本周期がT、Tなど一定ではないため、本実施例では、分析フレーム内での基本周期のゆらぎを考慮したシフト量として、周波数空間での重み付き平均 Here, the basic period contained in the analysis frame if the constant T 0, the shift amount required for the calculation of the TANDEM window that may be a T 0/2 are shown in the literature B. However, in an actual audio signal, the fundamental period in the analysis frame is not constant, such as T 0 , T 1 , and in this embodiment, the frequency space is used as a shift amount considering the fluctuation of the fundamental period in the analysis frame. Weighted average at

を用いる。ここで、wはガウス重みである。 Is used. Here, w is a Gaussian weight.

<ピッチマーク分析部>
ピッチマーク分析部14は、位相等化処理に用いるピッチマーク(ピッチマーク時刻群)を得るために、音声区間内で、基本周波数から得られる基本周期に応じたパルス系列信号(ピッチ波形)を生成する(ステップS5)。フレーム番号t、時刻kにおいて、音声区間内で、ピッチ波形ex(t,k)の絶対値と、LPC残差信号e(t,k)の絶対値の間で、フレームt毎に、相互相関関数
r(t,j)=Σ|e(t,k)|×|ex(t,k+j)|
を計算し、Σ r(t,j)が最大となるようなjの系列を、動的計画法を用いて求め、ピッチマーク時刻群の候補を得る。そして、得られたピッチマーク時刻の近傍で、LPC残差信号の絶対値が最大となる時刻を探索する。
<Pitch mark analysis unit>
The pitch mark analysis unit 14 generates a pulse series signal (pitch waveform) corresponding to the fundamental period obtained from the fundamental frequency within the speech section in order to obtain a pitch mark (pitch mark time group) used for phase equalization processing. (Step S5). At frame number t and time k, the cross-correlation is performed for each frame t between the absolute value of the pitch waveform ex (t, k) and the absolute value of the LPC residual signal e (t, k) within the speech section. Function r (t, j) = Σ k | e (t, k) | × | ex (t, k + j) |
Was calculated, the Σ t r (t, j) is series of j such that maximum, determined using a dynamic programming method to obtain a candidate of the pitch mark time group. Then, a time at which the absolute value of the LPC residual signal is maximized is searched in the vicinity of the obtained pitch mark time.

さらに、得られたピッチマーク時刻群の中で、残差の絶対値が最大となるピッチマーク時刻を起点として選択し、隣り合うピッチマーク時刻間でのLPC残差信号の自己相関関数(変形自己相関関数)が最大となる時刻を順次探索し、最終的なピッチマーク時刻群として抽出する(ステップS6)。ここで、得られた隣り合うピッチマーク時刻間で差分を取り、新たな基本周期として用いても良い。   Furthermore, the pitch mark time at which the absolute value of the residual is the maximum is selected from the obtained pitch mark time group, and the autocorrelation function (modified self) of the LPC residual signal between adjacent pitch mark times is selected. The time when the correlation function is maximized is sequentially searched and extracted as a final pitch mark time group (step S6). Here, a difference may be taken between the obtained adjacent pitch mark times and used as a new basic period.

<位相等化音声生成部>
位相等化音声生成部15は、位相等化音声信号を得るために、ピッチマーク(ピッチマーク時刻群)とLPC残差信号を用いて、LPC残差信号の値をピッチマーク時刻を中心として反転させ、正規化した値を係数として持つ位相等化フィルタを求め、これを音声信号(ここでは、以下、原音声と言う)に施すことにより位相等化音声信号を得る(ステップS7)。
<Phase equalized speech generator>
In order to obtain a phase-equalized audio signal, the phase-equalized audio generation unit 15 uses the pitch mark (pitch mark time group) and the LPC residual signal, and inverts the value of the LPC residual signal around the pitch mark time. Then, a phase equalization filter having a normalized value as a coefficient is obtained, and this is applied to an audio signal (hereinafter referred to as the original audio) to obtain a phase equalized audio signal (step S7).

ここで、ピッチマーク時刻毎に得られる位相等化フィルタの係数を、例えば1次のローパスフィルタを用いて時間的に平滑化した後、位相等化フィルタを原音声に施す。位相等化フィルタのタップ数は基本周期の長さと同じとする。位相等化処理による原音声のスペクトルの変形はわずかである。また、人間の聴覚は音声信号の短時間位相特性に対して比較的鈍感であるため、原音声と位相等化音声の聴感上の違いはわずかである。この処理は、例えば、特許第2061816号公報(以下、特許文献1と言う)に記載されている方法によるものであり、LPC残差信号のエネルギを時間的に集中化させることで、原音声に含まれる位相を零に近似させることを可能にする。   Here, after the phase equalization filter coefficient obtained at each pitch mark time is smoothed temporally using, for example, a first-order low-pass filter, the phase equalization filter is applied to the original sound. The number of taps of the phase equalization filter is the same as the length of the fundamental period. The deformation of the original speech spectrum due to the phase equalization process is slight. In addition, since human hearing is relatively insensitive to the short-time phase characteristics of audio signals, the difference in audibility between the original audio and the phase-equalized audio is slight. This processing is based on, for example, the method described in Japanese Patent No. 2061816 (hereinafter referred to as Patent Document 1), and by concentrating the energy of the LPC residual signal over time, Makes it possible to approximate the included phase to zero.

<第2LPC分析部>
第2LPC分析部16は、位相等化音声生成部15にて取得した位相等化音声信号に対して、LPC分析を行う。
ここで、位相等化音声信号は単一パルス列と白色雑音による音源信号が声道フィルタを通ることで生成されたと仮定できる。つまり、分析フレーム内に含まれるI+1個のピッチマーク時刻群(t,t,…,tはピッチマーク時刻)におけるパルスの振幅をG、それ以外の時刻tでは従来のLPC分析と同様に音源として白色雑音を仮定すると、
<Second LPC analysis unit>
The second LPC analysis unit 16 performs LPC analysis on the phase equalized speech signal acquired by the phase equalized speech generation unit 15.
Here, it can be assumed that the phase-equalized audio signal is generated by passing a sound source signal of a single pulse train and white noise through a vocal tract filter. That is, the amplitude of the pulse in the group of I + 1 pitch marks included in the analysis frame (t 0 , t 1 ,..., T I is the pitch mark time) is G, and other times t are the same as in the conventional LPC analysis. Assuming white noise as the sound source,

を最小化するLPC係数aとパルスの振幅Gを位相等化音声信号から求める問題に帰着される。ここで、rは重みであり、eはLPC残差信号、sは位相等化音声信号、pはLPC分析次数である。 This results in the problem of obtaining the LPC coefficient a and the amplitude G of the pulse from the phase-equalized audio signal. Here, r is a weight, e is an LPC residual signal, s is a phase equalized speech signal, and p is an LPC analysis order.

位相等化音声信号は、4msの分析シフト長で、音声信号を基本周期の2.5倍を窓長としたブラックマン窓で切り出す。r=0であれば、位相等化音声信号のs(t)…s(t)を除去した標本選択線形予測法と等価になり、G=0かつr=1であれば、従来の白色雑音を仮定したLPC分析と等価になる。 The phase-equalized audio signal is cut out by a Blackman window with an analysis shift length of 4 ms and a window length of 2.5 times the basic period. If r = 0, this is equivalent to the sample selection linear prediction method in which s (t 0 )... s (t I ) of the phase-equalized audio signal is removed. If G = 0 and r = 1, This is equivalent to LPC analysis assuming white noise.

本実施例では、G≠0かつr=1とすることで、標本選択線形予測法の欠点であった位相等化音声の除去に伴うデータ数の不足の問題を回避し、さらにLPC分析と音源信号のパラメータGを同一の枠組みで最適化できるという特徴を持つ。上記評価式を最小にするLPC係数は、以下の連立方程式を解くことで得られる(但し、r=1とした)。   In this embodiment, by setting G ≠ 0 and r = 1, the problem of lack of data due to the removal of phase-equalized speech, which was a drawback of the sample selection linear prediction method, is avoided, and further, LPC analysis and sound source The characteristic is that the signal parameter G can be optimized in the same framework. The LPC coefficient that minimizes the evaluation formula can be obtained by solving the following simultaneous equations (provided that r = 1).

この連立方程式はLevinsonアルゴリズムを用いて効率的に解くことができる。ここで、位相等化音声信号の自己相関関数Rは、 This simultaneous equation can be efficiently solved using the Levinson algorithm. Here, the autocorrelation function R of the phase equalized speech signal is

パルスの振幅Gは、上記Levinsonアルゴリズムを解く際に得られるPARCOR係数kと自己相関関数Rより、 The amplitude G of the pulse is obtained from the PARCOR coefficient k and the autocorrelation function R obtained when solving the Levinson algorithm.

となる。上式(3)は自己相関法の拡張であることから、声道スペクトルの時間方向平滑化を目的としたTANDEM窓の適用が可能である。この場合、右辺のGを含む相互相関関数の値として、当該分析フレームと式(1)分シフトした分析フレームの相互相関関数を足して2で割った値を用いればよい。 It becomes. Since the above equation (3) is an extension of the autocorrelation method, it is possible to apply a TANDEM window for the purpose of temporal smoothing of the vocal tract spectrum. In this case, as the value of the cross-correlation function including G on the right side, a value obtained by adding the cross-correlation function of the analysis frame and the analysis frame shifted by the expression (1) and dividing by 2 may be used.

Gの初期値を、位相等化音声信号の計算でDurbinアルゴリズムを解く際に得られるPARCOR係数kと自己相関関数Rを用いて上式(5)を計算したものとして、LPC係数の決定(ステップS9)、Gの決定(ステップS8)を繰り返し行う(例えば、5回)。この繰り返しにより、得られるパラメータの推定精度の向上が期待される。本実施例では、位相等化音声信号は、比較的簡単な音源から生成されると仮定できることから、解くべき問題が簡単になる。このことは、繰り返し計算に時間がかからず、さらには安定な解を容易に求めることができるという結果につながる。ここでのLPC分析の分析次数は、位相等化処理のための次数と異なるものを用いてもよい。またこの分析により基本周波数の影響は低減されるため、ラグ窓は用いなくてよい。   Assuming that the initial value of G is calculated from the above equation (5) using the PARCOR coefficient k and autocorrelation function R obtained when solving the Durbin algorithm by calculating the phase equalized speech signal, the LPC coefficient is determined (step S9) and G determination (step S8) is repeated (for example, 5 times). This repetition is expected to improve the estimation accuracy of the obtained parameters. In this embodiment, it can be assumed that the phase-equalized audio signal is generated from a relatively simple sound source, so that the problem to be solved is simplified. This leads to the result that iterative calculation does not take time and a stable solution can be easily obtained. The analysis order of the LPC analysis here may be different from the order for the phase equalization process. In addition, since the influence of the fundamental frequency is reduced by this analysis, the lag window need not be used.

なお、音声区間以外でのLPC係数と白色雑音ゲインは、15msの固定窓長、4msの固定フレームシフト長を用いて求める。これは本実施例のLPC分析法でI=0とすることと等価である。   Note that the LPC coefficient and white noise gain outside the speech section are obtained using a fixed window length of 15 ms and a fixed frame shift length of 4 ms. This is equivalent to setting I = 0 in the LPC analysis method of this embodiment.

<マルチパルス音源モデル生成部>
マルチパルス音源モデル生成部17は、音声区間内で、上記位相等化音声信号との聴覚重み付き誤差が最小となるような、パルスの振幅(パルスゲイン)とマルチパルス音源モデルのパラメータ(FIRフィルタ係数v)を求める(ステップS10)。ここで、FIRフィルタ(6タップ)の伝達特性は、特許文献1と同様に、次のように表わされる。
<Multipulse source model generator>
The multi-pulse sound source model generation unit 17 sets the amplitude of the pulse (pulse gain) and the parameters (FIR filter) of the multi-pulse sound source model so that the auditory weighted error with the phase-equalized speech signal is minimized within the speech section. The coefficient v k ) is obtained (step S10). Here, the transfer characteristic of the FIR filter (6 taps) is expressed as follows, as in Patent Document 1.

位相等化パルス音源の計算には、各ピッチマーク時刻位置を分析開始時点として、分析窓長は1基本周期として求める。本実施例では、分析にはピッチ同期分析を用いるが、合成には4msフレームシフトを用いるため、ピッチマーク時刻位置と固定長フレームの開始時点が異なることが問題となる。従って、本実施例では、各フレームにおけるパラメータは線形補間により求める。本実施例では、時間・周波数ともに滑らかなLPCスペクトルが得られるため、上記計算により得られる音源モデルのパラメータも時間的に滑らかに変化すると期待される。 For the calculation of the phase equalization pulse sound source, each pitch mark time position is set as the analysis start time, and the analysis window length is determined as one basic period. In this embodiment, pitch synchronization analysis is used for the analysis, but since a 4 ms frame shift is used for the synthesis, there is a problem that the pitch mark time position is different from the start time of the fixed-length frame. Therefore, in this embodiment, the parameters in each frame are obtained by linear interpolation. In this embodiment, since a smooth LPC spectrum is obtained in both time and frequency, the parameters of the sound source model obtained by the above calculation are expected to change smoothly in time.

<白色雑音ゲイン生成部>
白色雑音ゲイン生成部18は、第2LPC分析部16においてLevinsonアルゴリズムを解く際に得られるPARCOR係数kと自己相関関数Rを用いて白色雑音ゲインを計算する(ステップS11)。
<White noise gain generator>
The white noise gain generation unit 18 calculates the white noise gain using the PARCOR coefficient k and the autocorrelation function R obtained when the second LPC analysis unit 16 solves the Levinson algorithm (step S11).

<音声合成部>
音声合成部19は、LPC係数と音源信号を畳み込み演算することにより音声合成を行う(ステップS13)。音源信号は、音声区間以外では白色雑音に白色雑音ゲインを乗じたものを用いる。音声区間では、基本周波数、パルスゲインおよびマルチパルス音源モデルから計算されるマルチパルス、あるいは基本周波数とパルスゲインから計算される単一パルス列を用いる(ステップS12)。
<Speech synthesis unit>
The speech synthesizer 19 performs speech synthesis by convolving the LPC coefficient and the sound source signal (step S13). As a sound source signal, a signal obtained by multiplying white noise by a white noise gain is used outside the speech section. In the speech section, a multipulse calculated from the fundamental frequency, pulse gain, and multipulse sound source model, or a single pulse train calculated from the fundamental frequency and pulse gain is used (step S12).

[変形例]
原音声あるいは位相等化音声信号を0-500、500-1000、1000-2000、2000-3000、3000-4000、4000-5000、5000-6000、6000-7000、7000-8000 Hzの帯域通過フィルタに通したそれぞれの信号の自己相関関数を計算し、これを有声強度として求めても良い。この計算には、上記と同様に、各ピッチマーク時刻位置を分析開始時点として、ピッチ同期分析を行う。音声合成の際には、有声強度に基づき、ある閾値より大きい帯域を有声帯域、小さい帯域を無声帯域として、有声帯域ではマルチパルスあるいは単一パルス列、無声帯域では白色雑音を混合した駆動音源を作成し、LPC係数と畳み込めば良い。
[Modification]
Bandpass filter of 0-500, 500-1000, 1000-2000, 2000-3000, 3000-4000, 4000-5000, 5000-6000, 6000-7000, 7000-8000 Hz The autocorrelation function of each passed signal may be calculated and obtained as the voiced intensity. In this calculation, similarly to the above, pitch synchronization analysis is performed with each pitch mark time position as the analysis start time. When synthesizing speech, based on the voiced intensity, a voice source is created with a band greater than a certain threshold as a voiced band, a smaller band as a voiceless band, a multipulse or single pulse train in the voiced band, and white noise mixed in the voiceless band. Then, it may be convolved with the LPC coefficient.

また、音声分析により得られたパラメータを変換した後、音声を合成しても良い。例えば、基本周波数の値を半分あるいは2倍にする、LPC分析により得られるフォルマント周波数を任意にシフトさせる、あるいは、時間軸を2倍あるいは半分にすることも可能である。   Further, after converting the parameters obtained by the voice analysis, the voice may be synthesized. For example, the fundamental frequency value can be halved or doubled, the formant frequency obtained by the LPC analysis can be arbitrarily shifted, or the time axis can be doubled or halved.

[実験例]
女声英語母国語話者が発声した「rise」の/i/のLPCスペクトルを図5に示す。本実験では、位相等化処理のためのLPC分析次数は40次、LPCスペクトルを得るための分析次数は20次とした。音声のサンプリングレートは16kHzである。図5の細線はラグ窓を用いた従来のLPC分析法、太線は本発明によるLPC分析法(提案法)により得られたものである。
[Experimental example]
FIG. 5 shows the LPC spectrum of “rise” produced by a female English native speaker. In this experiment, the LPC analysis order for phase equalization processing was 40th, and the analysis order for obtaining an LPC spectrum was 20th. The audio sampling rate is 16 kHz. The thin line in FIG. 5 is obtained by the conventional LPC analysis method using a lug window, and the thick line is obtained by the LPC analysis method (proposed method) according to the present invention.

従来法では声道スペクトルとして基本周波数を拾ってしまっているが、提案法は基本周波数を拾うことなく、第1フォルマント周波数(F1)を正確に抽出できていることが分かる。また、F1の改善に伴い、F2の振幅が回復していることが分かる。   In the conventional method, the fundamental frequency is picked up as a vocal tract spectrum. However, it can be seen that the proposed method can accurately extract the first formant frequency (F1) without picking up the fundamental frequency. It can also be seen that the amplitude of F2 has recovered with the improvement of F1.

図6は男声話者が発声した「腕前」のLPCスペクトルで、(A)はTANDEM窓なし(従来法)、(B)はTANDEM窓あり(提案法)である。位相等化処理およびLPCスペクトルのためのLPC分析次数は50次とした。(A)は時間的に不連続なスペクトルを示しているが(つまり、縦じまが見られる)、(B)は時間的に連続的なスペクトルが得られていることが分かる。   FIGS. 6A and 6B are LPC spectra of “skills” uttered by a male speaker, where FIG. 6A shows no TANDEM window (conventional method) and FIG. 6B shows a TANDEM window (proposed method). The LPC analysis order for the phase equalization process and the LPC spectrum was 50th. (A) shows a temporally discontinuous spectrum (that is, vertical stripes are seen), but (B) shows that a temporally continuous spectrum is obtained.

図7は発声資料「腕前」の一部の波形例を示している。図7(A)は、16kHzでサンプリングした音声信号、図7(B)はLPC分析を行い、音声信号をLPC逆フィルタに通すことにより得られたLPC残差信号、図7(C)はLPC残差信号に基づく位相等化フィルタをLPC残差信号に施して得られた位相等化残差信号、図7(D)は位相等化フィルタを音声信号に施して得られた位相等化音声信号、図7(E)は合成された音声信号である。   FIG. 7 shows a partial waveform example of the utterance material “skill”. 7A is an audio signal sampled at 16 kHz, FIG. 7B is an LPC residual signal obtained by performing LPC analysis and passing the audio signal through an LPC inverse filter, and FIG. 7C is an LPC signal. The phase equalization residual signal obtained by applying the phase equalization filter based on the residual signal to the LPC residual signal. FIG. 7D shows the phase equalization sound obtained by applying the phase equalization filter to the audio signal. FIG. 7E shows a synthesized audio signal.

図7(B)はLPC残差信号にも位相特性の影響が見られるが、位相等化処理を行った図7(C)は単一パルス列と白色雑音が混合したものと見なすことができる。さらに、図7(D)は図7(A)と波形が異なるが、聴感上の差異はほとんど感じられないことを確認している。図7(E)は合成音声であるが、図7(D)とほぼ同じ波形をしており、提案法の有効性が示されている。また、提案法による合成音声と原音声の聴感上の違いはほとんどないことが確認された。   In FIG. 7B, the effect of the phase characteristic is also seen in the LPC residual signal, but FIG. 7C in which the phase equalization processing is performed can be regarded as a mixture of a single pulse train and white noise. Furthermore, although FIG. 7D has a waveform different from that of FIG. 7A, it has been confirmed that almost no difference in audibility is felt. FIG. 7 (E) shows synthesized speech, but has almost the same waveform as FIG. 7 (D), indicating the effectiveness of the proposed method. In addition, it was confirmed that there was almost no difference in audibility between synthesized speech and original speech by the proposed method.

以上説明した音声分析合成装置及び音声分析合成方法はコンピュータと、コンピュータにインストールされたプログラムによって実現することができる。コンピュータはインストールされたプログラムを実行することにより音声分析合成装置として機能する。   The speech analysis / synthesis apparatus and speech analysis / synthesis method described above can be realized by a computer and a program installed in the computer. The computer functions as a speech analysis / synthesis device by executing the installed program.

なお、図1に示した音声分析合成装置10における第2LPC分析部はLPC分析装置単体として取り扱ってもよい。   Note that the second LPC analysis unit in the speech analysis / synthesis apparatus 10 shown in FIG. 1 may be handled as a single LPC analysis apparatus.

Claims (7)

位相等化音声信号とピッチマーク時刻群とを入力とし、
音源信号を前記ピッチマーク時刻群の各ピッチマーク時刻に振幅Gの単一パルスをもち、前記ピッチマーク時刻以外の時刻は白色雑音よりなるものとし、
LPC係数と前記音源信号とによって得られる音声信号と、前記位相等化音声信号との誤差が最小となるように、PARCOR係数kと自己相関関数Rを用いて前記LPC係数及び前記振幅Gを求める構成とされていることを特徴とするLPC分析装置。
With the phase equalized audio signal and pitch mark time group as inputs,
The sound source signal has a single pulse of amplitude G at each pitch mark time of the pitch mark time group, and the time other than the pitch mark time is composed of white noise,
The LPC coefficient and the amplitude G are obtained using the PARCOR coefficient k and the autocorrelation function R so that the error between the audio signal obtained from the LPC coefficient and the sound source signal and the phase equalized audio signal is minimized. An LPC analyzer characterized by being configured.
位相等化音声信号とピッチマーク時刻群とを入力とし、
音源信号を前記ピッチマーク時刻群の各ピッチマーク時刻に振幅Gの単一パルスをもち、前記ピッチマーク時刻以外の時刻は白色雑音よりなるものとし、
LPC係数と前記音源信号とによって得られる音声信号と、前記位相等化音声信号との誤差が最小となるように、PARCOR係数kと自己相関関数Rを用いて前記LPC係数及び前記振幅Gを求めることを特徴とするLPC分析方法。
With the phase equalized audio signal and pitch mark time group as inputs,
The sound source signal has a single pulse of amplitude G at each pitch mark time of the pitch mark time group, and the time other than the pitch mark time is composed of white noise,
The LPC coefficient and the amplitude G are obtained using the PARCOR coefficient k and the autocorrelation function R so that the error between the audio signal obtained from the LPC coefficient and the sound source signal and the phase equalized audio signal is minimized. An LPC analysis method characterized by the above.
入力された音声信号の音声区間を検出する音声区間検出部と、
前記音声区間に対して前記音声信号から基本周波数を推定する基本周波数分析部と、
前記基本周波数に基づき決定した窓長で前記音声信号を切り出してLPC分析を行い、LPC逆フィルタに前記音声信号を通すことによりLPC残差信号を求める第1LPC分析部と、
前記基本周波数から得られる基本周期に応じたピッチ波形を生成し、そのピッチ波形と前記LPC残差信号とを用いてピッチマーク時刻群を抽出するピッチマーク分析部と、
前記ピッチマーク時刻群と前記LPC残差信号とを用いて求めた位相等化フィルタを前記音声信号に施すことにより位相等化音声信号を生成する位相等化音声生成部と、
音源信号を前記ピッチマーク時刻群の各ピッチマーク時刻に振幅Gの単一パルスをもち、前記ピッチマーク時刻以外の時刻は白色雑音よりなるものとし、LPC係数と前記音源信号とによって得られる音声信号と、前記位相等化音声信号との誤差が最小となるように、PARCOR係数kと自己相関関数Rを用いて前記LPC係数及び前記振幅Gを求める第2LPC分析部と、
前記ピッチマーク時刻群と前記位相等化音声信号と前記LPC係数とを用い、パルスゲインとマルチパルス音源モデルを求めるマルチパルス音源モデル生成部と、
前記第2LPC分析部におけるLPC分析の際に得られるPARCOR係数kと自己相関関数Rとを用いて白色雑音ゲインを計算する白色雑音ゲイン生成部と、
前記音声区間以外では白色雑音に前記白色雑音ゲインを乗じたものを用い、前記音声区間では前記基本周波数、前記パルスゲイン及び前記マルチパルス音源モデルから計算されるマルチパルスあるいは前記基本周波数と前記パルスゲインから計算される単一パルス列を用いてなる音源信号と前記LPC係数とを畳み込み演算することにより音声信号を合成する音声合成部と、
よりなることを特徴とする音声分析合成装置。
A voice section detector for detecting a voice section of the input voice signal;
A fundamental frequency analyzer for estimating a fundamental frequency from the speech signal for the speech section;
A first LPC analysis unit that cuts out the speech signal with a window length determined based on the fundamental frequency, performs LPC analysis, and obtains an LPC residual signal by passing the speech signal through an LPC inverse filter;
A pitch mark analysis unit that generates a pitch waveform according to a fundamental period obtained from the fundamental frequency and extracts a pitch mark time group using the pitch waveform and the LPC residual signal;
A phase-equalized sound generation unit that generates a phase-equalized sound signal by applying a phase equalization filter obtained using the pitch mark time group and the LPC residual signal to the sound signal;
The sound source signal has a single pulse of amplitude G at each pitch mark time of the pitch mark time group, and a time other than the pitch mark time is composed of white noise, and a sound signal obtained from the LPC coefficient and the sound source signal And a second LPC analysis unit for obtaining the LPC coefficient and the amplitude G using the PARCOR coefficient k and the autocorrelation function R so that an error from the phase equalized speech signal is minimized,
Using the pitch mark time group, the phase-equalized audio signal, and the LPC coefficient, a multi-pulse sound source model generating unit for obtaining a pulse gain and a multi-pulse sound source model;
A white noise gain generation unit that calculates a white noise gain using the PARCOR coefficient k and the autocorrelation function R obtained in the LPC analysis in the second LPC analysis unit;
Other than the speech section, white noise multiplied by the white noise gain is used. In the speech section, multipulses calculated from the fundamental frequency, the pulse gain, and the multipulse sound source model, or the fundamental frequency and the pulse gain. A speech synthesizer that synthesizes a speech signal by performing a convolution operation on a sound source signal using a single pulse train calculated from the LPC coefficient,
A speech analysis / synthesis apparatus characterized by comprising:
請求項3記載の音声分析合成装置において、
前記第2LPC分析部は前記振幅Gの初期値を、前記第1LPC分析部におけるLPC分析の際に得られるPARCOR係数kと自己相関関数Rとを用いて計算することを特徴とする音声分析合成装置。
The speech analysis / synthesis device according to claim 3,
The second LPC analysis unit calculates an initial value of the amplitude G using a PARCOR coefficient k and an autocorrelation function R obtained at the time of LPC analysis in the first LPC analysis unit. .
入力された音声信号の音声区間を検出する音声区間検出過程と、
前記音声区間に対して前記音声信号から基本周波数を推定する基本周波数分析過程と、
前記基本周波数に基づき決定した窓長で前記音声信号を切り出してLPC分析を行い、LPC逆フィルタに前記音声信号を通すことによりLPC残差信号を求める第1LPC分析過程と、
前記基本周波数から得られる基本周期に応じたピッチ波形を生成し、そのピッチ波形と前記LPC残差信号とを用いてピッチマーク時刻群を抽出するピッチマーク分析過程と、
前記ピッチマーク時刻群と前記LPC残差信号とを用いて求めた位相等化フィルタを前記音声信号に施すことにより位相等化音声信号を生成する位相等化音声生成過程と、
音源信号を前記ピッチマーク時刻群の各ピッチマーク時刻に振幅Gの単一パルスをもち、前記ピッチマーク時刻以外の時刻は白色雑音よりなるものとし、LPC係数と前記音源信号とによって得られる音声信号と、前記位相等化音声信号との誤差が最小となるように、PARCOR係数kと自己相関関数Rを用いて前記LPC係数及び前記振幅Gを求める第2LPC分析過程と、
前記ピッチマーク時刻群と前記位相等化音声信号と前記LPC係数とを用い、パルスゲインとマルチパルス音源モデルを求めるマルチパルス音源モデル生成過程と、
前記第2LPC分析過程におけるLPC分析の際に得られるPARCOR係数kと自己相関関数Rを用いて白色雑音ゲインを計算する白色雑音ゲイン生成過程と、
前記音声区間以外では白色雑音に前記白色雑音ゲインを乗じたものを用い、前記音声区間では前記基本周波数、前記パルスゲイン及び前記マルチパルス音源モデルから計算されるマルチパルスあるいは前記基本周波数と前記パルスゲインから計算される単一パルス列を用いてなる音源信号と前記LPC係数とを畳み込み演算することにより音声信号を合成する音声合成過程と、
よりなることを特徴とする音声分析合成方法。
A voice segment detection process for detecting a voice segment of the input voice signal;
A fundamental frequency analysis process for estimating a fundamental frequency from the speech signal for the speech interval;
A first LPC analysis process in which the speech signal is cut out with a window length determined based on the fundamental frequency to perform LPC analysis, and an LPC residual signal is obtained by passing the speech signal through an LPC inverse filter;
A pitch mark analysis process for generating a pitch waveform corresponding to a fundamental period obtained from the fundamental frequency and extracting a pitch mark time group using the pitch waveform and the LPC residual signal;
A phase-equalized speech generation process for generating a phase-equalized speech signal by applying a phase-equalization filter obtained using the pitch mark time group and the LPC residual signal to the speech signal;
The sound source signal has a single pulse of amplitude G at each pitch mark time of the pitch mark time group, and a time other than the pitch mark time is composed of white noise, and a sound signal obtained from the LPC coefficient and the sound source signal And a second LPC analysis process for obtaining the LPC coefficient and the amplitude G using the PARCOR coefficient k and the autocorrelation function R so that an error from the phase equalized speech signal is minimized,
Using the pitch mark time group, the phase-equalized audio signal, and the LPC coefficient, a multi-pulse sound source model generation process for obtaining a pulse gain and a multi-pulse sound source model;
A white noise gain generation step of calculating a white noise gain using a PARCOR coefficient k and an autocorrelation function R obtained in the LPC analysis in the second LPC analysis step;
Other than the speech section, white noise multiplied by the white noise gain is used. In the speech section, multipulses calculated from the fundamental frequency, the pulse gain, and the multipulse sound source model, or the fundamental frequency and the pulse gain. A speech synthesis process for synthesizing a speech signal by performing a convolution operation on a sound source signal using a single pulse train calculated from the LPC coefficient;
A speech analysis and synthesis method characterized by comprising:
請求項5記載の音声分析合成方法において、
前記第2LPC分析過程は前記振幅Gの初期値を、前記第1LPC分析過程におけるLPC分析の際に得られるPARCOR係数kと自己相関関数Rとを用いて計算することを特徴とする音声分析合成方法。
The speech analysis and synthesis method according to claim 5,
In the second LPC analysis process, an initial value of the amplitude G is calculated using a PARCOR coefficient k and an autocorrelation function R obtained in the LPC analysis in the first LPC analysis process. .
請求項3又は4記載の音声合成分析装置としてコンピュータを機能させるためのプログラム。   A program for causing a computer to function as the speech synthesis analysis apparatus according to claim 3.
JP2010012963A 2010-01-25 2010-01-25 LPC analysis device, LPC analysis method, speech analysis / synthesis device, speech analysis / synthesis method, and program Active JP5325130B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2010012963A JP5325130B2 (en) 2010-01-25 2010-01-25 LPC analysis device, LPC analysis method, speech analysis / synthesis device, speech analysis / synthesis method, and program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2010012963A JP5325130B2 (en) 2010-01-25 2010-01-25 LPC analysis device, LPC analysis method, speech analysis / synthesis device, speech analysis / synthesis method, and program

Publications (2)

Publication Number Publication Date
JP2011150232A JP2011150232A (en) 2011-08-04
JP5325130B2 true JP5325130B2 (en) 2013-10-23

Family

ID=44537261

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2010012963A Active JP5325130B2 (en) 2010-01-25 2010-01-25 LPC analysis device, LPC analysis method, speech analysis / synthesis device, speech analysis / synthesis method, and program

Country Status (1)

Country Link
JP (1) JP5325130B2 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5705086B2 (en) * 2011-10-14 2015-04-22 日本電信電話株式会社 Vocal tract spectrum extraction device, vocal tract spectrum extraction method and program
JP5631915B2 (en) * 2012-03-29 2014-11-26 株式会社東芝 Speech synthesis apparatus, speech synthesis method, speech synthesis program, and learning apparatus
JP6213217B2 (en) * 2013-12-19 2017-10-18 富士通株式会社 Speech synthesis apparatus and computer program for speech synthesis
JP6285823B2 (en) * 2014-08-12 2018-02-28 日本電信電話株式会社 LPC analysis apparatus, speech analysis conversion synthesis apparatus, method and program thereof

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS61107400A (en) * 1984-10-31 1986-05-26 日本電気株式会社 Voice synthesizer
JPS61256400A (en) * 1985-05-10 1986-11-13 株式会社日立製作所 Voice analysis/synthesization system
JPH0782360B2 (en) * 1989-10-02 1995-09-06 日本電信電話株式会社 Speech analysis and synthesis method
JPH05265494A (en) * 1992-03-23 1993-10-15 Idou Tsushin Syst Kaihatsu Kk Speech encoding and decoding device
JP3292711B2 (en) * 1999-08-06 2002-06-17 株式会社ワイ・アール・ピー高機能移動体通信研究所 Voice encoding / decoding method and apparatus
JP4999757B2 (en) * 2008-03-31 2012-08-15 日本電信電話株式会社 Speech analysis / synthesis apparatus, speech analysis / synthesis method, computer program, and recording medium

Also Published As

Publication number Publication date
JP2011150232A (en) 2011-08-04

Similar Documents

Publication Publication Date Title
Bayya et al. Spectro-temporal analysis of speech signals using zero-time windowing and group delay function
Sukhostat et al. A comparative analysis of pitch detection methods under the influence of different noise conditions
Mittal et al. Study of characteristics of aperiodicity in Noh voices
Manfredi et al. Perturbation measurements in highly irregular voice signals: Performances/validity of analysis software tools
Athineos et al. LP-TRAP: Linear predictive temporal patterns
Morise Error evaluation of an F0-adaptive spectral envelope estimator in robustness against the additive noise and F0 error
Roy et al. Precise detection of speech endpoints dynamically: A wavelet convolution based approach
JP5325130B2 (en) LPC analysis device, LPC analysis method, speech analysis / synthesis device, speech analysis / synthesis method, and program
Hanilçi et al. Comparing spectrum estimators in speaker verification under additive noise degradation
Kumar et al. Performance evaluation of a ACF-AMDF based pitch detection scheme in real-time
Mitev et al. Fundamental frequency estimation of voice of patients with laryngeal disorders
Mittal et al. Significance of aperiodicity in the pitch perception of expressive voices
Bouzid et al. Voice source parameter measurement based on multi-scale analysis of electroglottographic signal
Zhao et al. A processing method for pitch smoothing based on autocorrelation and cepstral F0 detection approaches
Ganapathy et al. Robust spectro-temporal features based on autoregressive models of hilbert envelopes
Upadhya Pitch detection in time and frequency domain
Khonglah et al. Speech enhancement using source information for phoneme recognition of speech with background music
Liu et al. Speech enhancement of instantaneous amplitude and phase for applications in noisy reverberant environments
Kaewtip et al. A pitch-based spectral enhancement technique for robust speech processing.
JP5705086B2 (en) Vocal tract spectrum extraction device, vocal tract spectrum extraction method and program
Park et al. Pitch detection based on signal-to-noise-ratio estimation and compensation for continuous speech signal
Park et al. Improving pitch detection through emphasized harmonics in time-domain
Shome et al. Non-negative frequency-weighted energy-based speech quality estimation for different modes and quality of speech
Upadhya et al. Pitch estimation using autocorrelation method and AMDF
Ramesh et al. Glottal opening instants detection using zero frequency resonator

Legal Events

Date Code Title Description
RD03 Notification of appointment of power of attorney

Free format text: JAPANESE INTERMEDIATE CODE: A7423

Effective date: 20110624

A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20120116

A977 Report on retrieval

Free format text: JAPANESE INTERMEDIATE CODE: A971007

Effective date: 20121127

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20121211

A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20130116

TRDD Decision of grant or rejection written
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 20130709

A61 First payment of annual fees (during grant procedure)

Free format text: JAPANESE INTERMEDIATE CODE: A61

Effective date: 20130719

R150 Certificate of patent or registration of utility model

Ref document number: 5325130

Country of ref document: JP

Free format text: JAPANESE INTERMEDIATE CODE: R150

Free format text: JAPANESE INTERMEDIATE CODE: R150

S531 Written request for registration of change of domicile

Free format text: JAPANESE INTERMEDIATE CODE: R313531

R350 Written notification of registration of transfer

Free format text: JAPANESE INTERMEDIATE CODE: R350