JP2844672B2

JP2844672B2 - Vocal vocal tract type speech analyzer

Info

Publication number: JP2844672B2
Application number: JP1137623A
Authority: JP
Inventors: 幸夫三留
Original assignee: Nippon Electric Co Ltd
Current assignee: NEC Corp
Priority date: 1989-05-31
Filing date: 1989-05-31
Publication date: 1999-01-06
Anticipated expiration: 2014-01-06
Also published as: JPH032900A

Description

【発明の詳細な説明】（産業上の利用分野）本発明は、声帯声道型音声分析装置に関し、特に音声
の声帯音源波と声道の特性を同時に分析する音声分析装
置に関する。Description: BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a vocal tract vocal tract type voice analyzer, and more particularly, to a vocal tract vocal tract type voice analyzer that simultaneously analyzes a vocal vocal source wave and characteristics of a vocal tract.

（従来の技術）音声合成や音声認識のための音声信号の代表的分析方
法としては３つの方法がある。第１の従来方法は、線形
予測法（LPC法）や、ARMA分析によるもので、音声信号
は、線形フィルタに有声音ならばインパルス列、無声音
ならば白色雑音が入力されたときの応答として生成され
るというモデルに基づいて、その線形フィルタの係数を
求める方法である。線形フィルタの構成法に応じたこの
方法の様々な変形として、α（アルファ）パラメータ
（AR関数と呼ばれることもある）、PARCOR（パーコー
ル：偏自己相関）係数、ARMA（エイアールエムエイまた
はアルマ）係数などと呼ばれる係数を求める方法が知ら
れている。これらは、ジェイ．ディー．マーケルとエ
イ．エイチ．グレイ（J.D.Markel and A.H.Gray）の著
書「音声の線形予測（鈴木訳）」（LINEAR Prediction
of Speech）に詳細に説明されている。(Prior Art) There are three typical methods for analyzing a speech signal for speech synthesis and speech recognition. The first conventional method is based on a linear prediction method (LPC method) or ARMA analysis, and a speech signal is generated as a response to an impulse train for voiced sounds and a white noise for unvoiced sounds to a linear filter. This is a method of obtaining the coefficients of the linear filter based on a model that is performed. Various variants of this method depending on the construction of the linear filter include α (alpha) parameters (sometimes called AR functions), PARCOR (Percoll: partial autocorrelation) coefficients, ARMA (AIR or ALMA) A method of obtaining a coefficient called a coefficient or the like is known. These are Jay. Dee. Markel and A. H. Gray (JDMarkel and AHGray), "Linear Prediction of Speech (Translated by Suzuki)" (LINEAR Prediction
of Speech).

第２の従来方法として、ピー．ヘデエリン（P.Hedeli
n）によって、インターナショナル・カンファレンス・
オン・アコースティックス・スピーチ・シグナル・プロ
セッシング（International Conference on Acoustics
Speech Signal Processing音響音声信号処理国際会
議）'86において、論文集の465ページから468ページ
に、ハイ・クオリティ・グロッタル・LPC−ボコーディ
ング（High Quality Glottal LPC−Vocoding:高品質声
帯LPC音声符号化）と題して発表されたものがある。As a second conventional method, p. Hedeerin (P.Hedeli
n) by the International Conference
On Acoustic Speech Signal Processing (International Conference on Acoustics)
Speech Signal Processing International Conference on Audio and Speech Signaling '86, pp. 465 to 468 of the proceedings, High Quality Glottal LPC-Vocoding (High Quality Glottal LPC-Vocoding) There was something announced.

これは第３図に示すように、有声音のときに、インパ
ルス列ではなく声帯音源101から生成される声帯音源が
線形フィルタ102に入力されるという音声の生成モデル
に基づいている。そして、このモデルによって生成され
た音声と実際の自然音声（信号線103から入力される）
の二乗誤差（104で算出される）が最小になるように最
適化して、音源波形のパラメータならびにAR係数を求め
るものである。As shown in FIG. 3, this is based on a voice generation model in which a vocal cord sound source generated from the vocal cord sound source 101 instead of an impulse train is input to the linear filter 102 in the case of a voiced sound. Then, the voice generated by this model and the actual natural voice (input from the signal line 103)
Are optimized to minimize the square error (calculated at 104) of the sound source waveform, and the parameters of the sound source waveform and the AR coefficient are obtained.

また、第３の従来例として、更にこれを改良したもの
が、'88年の同会議において、ヘデリン（Hedelin）によ
って、論文集の339ページから342ページに、フェーズ・
コンペンセイション・イン・オールポール・スピーチ・
アナリシス（Phase Compensation in All−Pole Speech
Analysis:全極形音声分析における位相補正）と題して
発表された論文に開示されている。As a third conventional example, a further improvement was made at the same conference in 1988 by Hedelin, who changed from 339 pages to 342 pages of a collection of papers in a phase.
Compensation in All Pole Speech
Analysis (Phase Compensation in All-Pole Speech
Analysis: phase correction in all-pole speech analysis).

これは、予め様々な波形に対応する有限個の符号を用
意しておき、声帯波に対してはその中の最適なものを選
び出し分析するものである。即ち、有限個の音源波形を
用意して、それに番号付けをしておく。分析は、まず音
源としてある符号（番号）の波形を選んで音声を合成し
て実際の音声との誤差を求める。これを異なる符号の波
形について繰り返し、誤差が最も少なかったときの符号
の波形を推定された声帯音源波形とするのである。In this method, a finite number of codes corresponding to various waveforms are prepared in advance, and for a vocal cord wave, an optimum one is selected and analyzed. That is, a limited number of sound source waveforms are prepared and numbered. In the analysis, first, a waveform of a certain code (number) is selected as a sound source, a voice is synthesized, and an error from an actual voice is obtained. This is repeated for waveforms of different codes, and the waveform of the code with the smallest error is used as the estimated vocal cord sound source waveform.

（発明が解決しようとする課題）上述第１の従来方法によれば、少ない演算量で音声の
分析ができる。しかし、音声合成などに適用すると、有
声音の場合はあまり良好な音質の合成音は得られない。
これは、音源として前記のようにインパルス列を仮定し
ていることに問題があると考えられる。生理学的な研究
結果によれば、声帯振動は非常に複雑で、簡単なインパ
ルスなどでは近似できないことが明らかになってきてい
る。このことは、分析されたαパラメータやARMA係数な
どが表している特性は、声道の特性ではなく、声帯音源
と声道の両方の特性を含んでいることを意味している。(Problem to be Solved by the Invention) According to the first conventional method described above, voice analysis can be performed with a small amount of calculation. However, when applied to voice synthesis or the like, synthesized voices with very good sound quality cannot be obtained for voiced sounds.
This is considered to be problematic because the impulse train is assumed as the sound source as described above. Physiological studies have revealed that vocal cord vibrations are very complex and cannot be approximated by simple impulses. This means that the characteristics represented by the analyzed α parameter, ARMA coefficient, and the like include characteristics of both the vocal tract sound source and the vocal tract, not the characteristics of the vocal tract.

インパルス列が実際の音源の良好な近似ではなく、分
析結果が声道の特性を表しているのではないということ
は、音声通信を目的とした符号化においては必ずしも致
命的な欠陥とはならず、パルスの数を増やすなどして音
質を改善することもある程度は可能である。しかし、文
字列から規則にしたがって音声を合成する任意語の音声
合成においては、ピッチなどの音源パラメータと、声道
のパラメータを独立に制御する必要があるため、実際の
音声の生成過程により近い音声合成モデルが望まれる。
また、符号化においてもより正確なモデルに基づいて分
析した方が効率や音響が向上される。The fact that the impulse train is not a good approximation of the actual sound source and the analysis results do not represent the characteristics of the vocal tract is not necessarily a fatal defect in coding for voice communication. It is possible to some extent to improve the sound quality by increasing the number of pulses. However, in speech synthesis of an arbitrary word that synthesizes speech from a character string in accordance with rules, it is necessary to independently control sound source parameters such as pitch and vocal tract parameters. A composite model is desired.
Also, in the coding, the efficiency and the sound are improved by analyzing based on a more accurate model.

この点を実際の音声の生成に近づけたものが上記第２
と第３の従来方法である。即ち、有声の音源としては、
実際の声帯の振動をスコープなどを用いて観測した結果
を参考にして波形をモデル化したものを使っている。こ
れら従来方法によれば、有声音の自然さや符号化効率を
向上することができる。The one that brings this point closer to actual speech generation is the second
And the third conventional method. That is, as a voiced sound source,
It uses a model of the waveform with reference to the result of observing the actual vocal cord vibration using a scope or the like. According to these conventional methods, naturalness of voiced sound and coding efficiency can be improved.

しかし、これらの第２第３の従来方法は、誤差を繰り
返し評価しながら係数の最適化を図るため、分析に膨大
な演算が必要であり、価格や装置規模の観点から実現か
困難であるという問題があった。これは第１の従来方法
と同一の構成である線形フィルタの部分の係数でさえ
も、従来の線形予測法などの分析法が応用できなかっ
た。しかも、声帯波の時間軸上の位置まで合わせなけれ
ばならない。通常は分析フレームの時間長は最低でもピ
ッチの三倍程度に設定するので、その間にある複数個の
声帯パルスの位置をそれぞれに決定する必要があり、さ
らに分析を複雑にしていた。However, these second and third conventional methods aim at optimizing the coefficients while repeatedly evaluating the error, and therefore require enormous calculations for analysis, which is difficult to realize from the viewpoint of price and apparatus scale. There was a problem. This is because the analysis method such as the conventional linear prediction method cannot be applied even to the coefficients of the linear filter part having the same configuration as the first conventional method. In addition, it is necessary to match the position of the vocal cord wave on the time axis. Normally, the time length of the analysis frame is set to at least about three times the pitch, so that it is necessary to determine the positions of a plurality of vocal cord pulses between them, which further complicates the analysis.

本発明の目的は、上記第２第３の従来方法のように、
声帯音源の波形モデルに基づいた分析でありながらそれ
より少ない演算量で実現できる声帯声道型音声分析装置
を提供することにある。An object of the present invention is to provide the above-described second and third conventional methods,
It is an object of the present invention to provide a vocal vocal tract-type speech analyzer that can be realized with a smaller amount of calculation while performing analysis based on a waveform model of a vocal fold sound source.

（課題を解決するための手段）前記課題を解決するため本発明における第１の発明に
よる声帯声道型音声分析装置は、入力音声信号を一時記
憶する入力バッファと、声帯音源波の持つ周波数特性の
逆特性を持つFIRフィルタと、様々な声帯音源波のもつ
周波数特性の逆特性をもつ前記FIRフィルタの係数を生
成する係数生成手段と、前記FIRフィルタの出力信号の
スペクトルを分析するスペクトル分析手段と、前記スペ
クトル分析手段で順次得られる分析誤差を互いに比較す
る誤差比較手段と、前記様々な声帯音源波の候補に対す
る前記FIRフィルタの係数をもつ前記FIRフィルタに前記
入力音声信号入力させたとき前記誤差比較手段で得られ
た分析誤差が最小になったときの前記声帯音源波の候補
を声帯音源波の分析結果とし、そのときの前記スペクト
ル分析の結果を声道特性の分析結果として出力するよう
制御する制御手段とを備える。(Means for Solving the Problems) In order to solve the above problems, a vocal vocal tract type voice analysis device according to a first aspect of the present invention comprises: an input buffer for temporarily storing an input voice signal; FIR filter having the inverse characteristic of the above, coefficient generating means for generating the coefficient of the FIR filter having the inverse characteristic of the frequency characteristic of various vocal fold source waves, and spectrum analysis means for analyzing the spectrum of the output signal of the FIR filter And error comparison means for comparing the analysis errors sequentially obtained by the spectrum analysis means with each other, and when the input audio signal is input to the FIR filter having the coefficients of the FIR filter for the various vocal source sound wave candidates, The candidate of the vocal cord source wave when the analysis error obtained by the error comparing means is minimized is regarded as the analysis result of the vocal cord source wave, and the spectrum at that time is obtained. Control means for controlling to output the result of the torque analysis as the analysis result of the vocal tract characteristics.

また、本発明における第２の発明による声帯声道型音
声分析装置において、入力音声信号を一時記憶する入力
バッファと、前記バッファ出力を入力とするIIRフィル
タと、様々な声帯音源波の持つ周波数特性の逆特性の最
小位相成分の特性を持つIIRフィルタの第１の係数と声
帯音源波の持つ周波数特性の逆特性の最大位相成分を逆
位相化した特性を持つIIRフィルタの第２の係数を生成
する係数生成手段と、前記IIRフィルタの出力を一時記
憶し、時間軸を反転して前記IIRフィルタに出力する時
間軸反転バッファと、前記IIRフィルタの出力信号のス
ペクトルを分析するスペクトル分析手段と、前記スペク
トル分析手段で順次得られる分析誤差を互いに比較する
誤差比較手段と、前記様々な音声音源波の候補に対し
て、入力音信号を前記第１の係数をもつ前記IIRフィル
タでフィルタリングさせた信号を前記時間軸反転バッフ
ァで時間軸を反転させ、次に前記時間軸を反転した信号
を再び前記第２の係数をもつ前記IIRフィルタでフィル
タリングさせた信号を時間軸反転バッファで再び時間軸
を反転させ、その信号を前記スペクトル分析手段で分析
したときの分析誤差を互いに比較し、分析誤差が最小に
なったときの前記声帯音源波の候補を声帯音源波の分析
結果とし、そのときの前記スペクトル分析の結果を声道
特性の分析結果として出力するよう制御する制御手段と
を備える。Further, in the vocal vocal tract type voice analysis device according to the second invention of the present invention, an input buffer for temporarily storing an input voice signal, an IIR filter having the buffer output as an input, and a frequency characteristic of various vocal vocal source waves Generates the first coefficient of an IIR filter having the characteristic of the minimum phase component of the inverse characteristic of the above and the second coefficient of the IIR filter having the characteristic obtained by inverting the maximum phase component of the inverse characteristic of the frequency characteristic of the vocal cord source wave. Coefficient generating means for temporarily storing the output of the IIR filter, a time axis inverting buffer for inverting the time axis and outputting to the IIR filter, and a spectrum analyzing means for analyzing the spectrum of the output signal of the IIR filter, Error comparison means for comparing analysis errors sequentially obtained by the spectrum analysis means with each other; and for the various voice sound source candidates, the input sound signal is converted to the first coefficient. The time axis of the signal filtered by the IIR filter is inverted by the time axis inversion buffer, and the signal obtained by filtering the signal whose time axis is inverted again by the IIR filter having the second coefficient is converted to a time signal. The time axis is inverted again by the axis inversion buffer, and the analysis error when the signal is analyzed by the spectrum analysis means is compared with each other. When the analysis error is minimized, the candidate of the vocal fold source wave is determined as the vocal fold source wave. Control means for controlling the analysis result and outputting the result of the spectrum analysis at that time as the analysis result of the vocal tract characteristics.

（作用）第１の本発明では、まず声帯音源波の持つ周波数特性
の逆特性を持つフィルタをFIRフィルタで実現する。こ
こに、逆特性というのは振幅特性が逆数の関係で、位相
特性が正負の符号を反転したものである。この逆特性の
FIRフィルタの係数は、もとの声帯波のフーリエ変換を
求め振幅の逆数と位相の符号を反転したものを求め逆フ
ーリエ変換することで求まる。前記第３の従来例のよう
に、声帯波を符号化したものの中から最適なものを選択
する方法では、この逆特性のFIRフィルタの係数も予め
各符号の声帯波について求めておけば良いので、分析の
際の演算量が増加することはない。(Operation) In the first embodiment of the present invention, first, a filter having the inverse characteristic of the frequency characteristic of the vocal cord source wave is realized by the FIR filter. Here, the inverse characteristic means that the amplitude characteristic has a reciprocal relationship, and the phase characteristic is obtained by reversing the positive and negative signs. This inverse characteristic
The coefficients of the FIR filter are obtained by calculating the Fourier transform of the original vocal cord wave, obtaining the inverse of the amplitude and the sign of the phase, and performing the inverse Fourier transform. As in the third conventional example, in the method of selecting the optimum one from the coded vocal folds, the coefficient of the FIR filter having the inverse characteristic may be obtained in advance for the vocal folds of each code. However, the amount of calculation at the time of analysis does not increase.

さて、もしこの逆特性のフィルタの特性が、真に声帯
波の特性の逆であったならば、この声帯波の逆特性のフ
ィルタを通した信号のスペクトルは、声帯波の持つ特性
がキャンセルされて、声道だけの特性を表していること
になる。従って、第１の従来例の線形予測法などを用い
て分析すれば、声道の特性が推定できる。しかし、真の
声帯波の特性を予め知ることはできないので、一度の計
算で正しく分析できるわけではない。そのため、様々な
声帯波を仮定して分析を繰り返し、誤差が最小になった
ものを最適な分析結果とする。By the way, if the characteristic of the filter having the inverse characteristic is truly the reverse of the characteristic of the vocal fold wave, the characteristic of the signal passed through the filter having the inverse characteristic of the vocal fold wave is canceled by the characteristic of the vocal fold wave. Thus, the characteristic of only the vocal tract is represented. Therefore, the characteristics of the vocal tract can be estimated by analysis using the linear prediction method of the first conventional example. However, since the characteristics of the true vocal fold waves cannot be known in advance, it is not always possible to correctly analyze them with a single calculation. Therefore, the analysis is repeated assuming various vocal fold waves, and the one with the smallest error is determined as the optimal analysis result.

また、第２の発明では、声帯音源波の持つ周波数特性
の逆特性を持つ逆フィルタをIIRフィルタで実現するも
のである。しかし、声帯波の位相特性は最小位相では表
せないため、その逆フィルタをIIRフィルタで実現しよ
うとすると、極を単位円の外側に設定させる必要がある
ため、不安定な回路となってしまう。即ち、単純に第１
の発明のFIRフィルタをIIRフィルタで置き換えることは
できない。そこで、単位円の外部の極の特性を、時間軸
を反転させてフィルタリングを行なうことで実現させ
る。Further, in the second invention, an inverse filter having an inverse characteristic of the frequency characteristic of the vocal cord source wave is realized by the IIR filter. However, since the phase characteristic of the vocal cord wave cannot be represented by the minimum phase, if an inverse filter is to be realized by an IIR filter, it is necessary to set the pole outside the unit circle, which results in an unstable circuit. That is, simply the first
It is not possible to replace the FIR filter of the invention of the invention with an IIR filter. Therefore, the characteristics of the poles outside the unit circle are realized by performing filtering by inverting the time axis.

そのために、IIRフィルタの出力を一時記憶し、時間
軸を反転して出力する時間軸反転バッファを設ける。そ
して、まず最小位相成分、即ち単位円の内部の極零に相
当する係数によりIIRフィルタでフィルタリングし、そ
の出力を時間軸反転バッファで時間軸反転する。その時
間軸反転された信号を、最大位相成分を逆位相化した特
性、即ち単位円の外部の極零を逆数で置き換えて単位円
の内部に写像した極零に相当する係数により再びフィル
タリングし、さらに再度時間軸を反転し通常の時間軸に
戻す。この時間軸を反転してフィルタリングする処理は
無限に続く信号では実現できないが、本発明では入力さ
れた音声信号を有限の軸間長の分析フレーム毎に区切っ
て処理するため実現できる。以後は前記第１の発明と同
様である。For this purpose, a time axis inversion buffer for temporarily storing the output of the IIR filter and inverting and outputting the time axis is provided. Then, filtering is performed by the IIR filter using the minimum phase component, that is, a coefficient corresponding to the pole zero inside the unit circle, and the output is time-axis inverted by the time-axis inversion buffer. The time-axis-inverted signal is filtered again by a characteristic corresponding to the characteristic obtained by inverting the maximum phase component, that is, a coefficient corresponding to the pole zero mapped inside the unit circle by replacing the pole zero outside the unit circle with the reciprocal, Further, the time axis is reversed again to return to the normal time axis. Although the process of inverting and filtering the time axis cannot be realized with an infinitely long signal, the present invention can be realized by processing the input audio signal by dividing it into analysis frames having a finite inter-axis length. Subsequent steps are the same as in the first invention.

これらの発明では、音源波の位置を推定する必要がな
い上、声道特性の推定には第１の従来例などの高速な分
析法を利用してαパラメータなどの値として得られる。
このようにして、声帯と声道の特性を表すパラメータの
値が得られる。様々な音源波について誤差の評価を繰り
返し探索するのは前記第２と第３の従来例と同様である
が、各繰り返し毎の演算量は少ない。In these inventions, it is not necessary to estimate the position of the sound source wave, and the vocal tract characteristics can be obtained as values of the α parameter or the like by using a high-speed analysis method such as the first conventional example.
In this way, the values of the parameters representing the characteristics of the vocal cords and the vocal tract are obtained. The repetitive search for the evaluation of the error for various sound source waves is the same as in the second and third conventional examples, but the amount of calculation for each repetition is small.

なお、この結果得られたαパラメータなどのもつ特性
は、第１の従来例の場合のように、声帯の特性を含まず
声道の特性を良好に近似しているものと考える。従っ
て、この特性の逆特性を実現し、元の音声信号をフィル
タリングすれば（これは逆フィルタリングと呼ばれる方
法である）、時間の関数としての声帯音源波が得られ
る。これは単に声帯波の形状だけではなく時間軸上の位
置情報も得られることを意味している。前記第２と第３
の従来例のように初めに位置も同時に推定しなければな
らないのとは異なり、分析が終了してから逆フィルタリ
ングすれば位置情報も含んだ音源波形の推定が容易にで
きる。It is to be noted that the characteristics such as the α parameter obtained as a result do not include the characteristics of the vocal cords, as in the case of the first conventional example, and are considered to closely approximate the characteristics of the vocal tract. Therefore, by realizing the inverse characteristic of this characteristic and filtering the original audio signal (this is a method called inverse filtering), a vocal cord source wave as a function of time is obtained. This means that not only the shape of the vocal cord waves but also the position information on the time axis can be obtained. The second and third
Unlike the prior art example in which the position must first be estimated at the same time, if the analysis is completed and inverse filtering is performed, the estimation of the sound source waveform including the position information can be easily performed.

（実施例）次に本発明の第１の実施例を図面を参照しながら説明
する。(Example) Next, a first example of the present invention will be described with reference to the drawings.

第１図は本発明の第１の発明の一実施例を示すブロッ
ク図である。FIG. 1 is a block diagram showing one embodiment of the first invention of the present invention.

入力バッファ２は、信号線９から入力された音声信号
を一時記憶し、制御回路１からの制御信号にしたがって
一分析フレーム分の音声信号をFIRフィルタ８におくる
ものである。The input buffer 2 temporarily stores the audio signal input from the signal line 9 and sends the audio signal for one analysis frame to the FIR filter 8 according to the control signal from the control circuit 1.

声帯波候補設定回路６は、声帯波の候補の符号を発生
し、FIR係数メモリ７と分析結果バッファ４に送る。符
号の発生方法としては、全部の符号に付いて総当たりで
最適符号を探索する場合は順に発生させれば良く、予め
木構造の符号に設定されているならば、MSBから順に１
または０を設定させれば良い。The vocal cord candidate setting circuit 6 generates a vocal cord candidate code and sends it to the FIR coefficient memory 7 and the analysis result buffer 4. As a code generation method, if a code is searched for all codes in a round robin search, the code may be generated in order.
Alternatively, 0 may be set.

FIR係数メモリ７には、様々な声帯音源波の持つ周波
数特性の逆特性を持つFIRフィルタの係数値が予め記憶
されていて、声帯波候補設定回路６から声帯波の候補の
符号が送られると、対応するFIRフィルタの係数値をFIR
フィルタ８に送る。The FIR coefficient memory 7 previously stores the coefficient values of the FIR filter having the inverse characteristics of the frequency characteristics of various vocal fold source waves, and when the vocal fold wave candidate setting circuit 6 sends the code of the vocal fold wave candidate. , The coefficient value of the corresponding FIR filter
Send to filter 8.

FIRフィルタ８は、前記FIR係数メモリ７から送られた
係数値を用いて、前記入力バッファ２から送られる音声
信号をフィルタリングし線形予測分析回路３に送る。The FIR filter 8 filters the audio signal sent from the input buffer 2 using the coefficient value sent from the FIR coefficient memory 7 and sends it to the linear prediction analysis circuit 3.

線形予測分析回路３は、FIRフィルタ８から送られた
信号の線形予測分析を行ない、線形予測係数（αパラメ
ータまたはPARCOR係数）を分析結果バッファ４に送り、
分析誤差を誤差比較回路５に送る。The linear prediction analysis circuit 3 performs a linear prediction analysis of the signal sent from the FIR filter 8 and sends a linear prediction coefficient (α parameter or PARCOR coefficient) to the analysis result buffer 4.
The analysis error is sent to the error comparison circuit 5.

分析結果バッファ４は二重のバッファになっていて、
第１のバッファには声帯波候補設定回路６から送られた
声帯波の候補の符号と、線形予測分析回路３から送られ
た線形予測係数を一時記憶し、第２のバッファには声帯
波の符号と線形予測係数の最適値の候補を記憶する。制
御回路１から最適値の更新の指示を示す制御信号が送ら
れたら、第１のバッファの内容を第２のバッファにコピ
ーする。これにより、全ての声帯波の候補についての分
析が終われば、第２のバッファに残ったデータが最適な
声帯波の信号と線形予測係数のデータとなる。Analysis result buffer 4 is a double buffer,
The first buffer temporarily stores the vocal fold wave candidate codes sent from the vocal fold wave candidate setting circuit 6 and the linear prediction coefficients sent from the linear prediction analysis circuit 3, and the second buffer stores the vocal fold wave candidate codes. The code and the candidate of the optimal value of the linear prediction coefficient are stored. When a control signal indicating an instruction to update the optimum value is sent from the control circuit 1, the contents of the first buffer are copied to the second buffer. Thus, when the analysis of all the vocal fold wave candidates is completed, the data remaining in the second buffer becomes the optimal vocal fold wave signal and the data of the linear prediction coefficient.

誤差比較回路５は分析誤差の最小値の候補を記憶する
メモリと比較器を持ち、メモリ内の分析誤差と線形予測
分析回路３から送られた分析誤差を比較して、後者の方
が小さい場合には、メモリ内の情報を書き換えるととも
に、制御回路１に通知する。なお、メモリは各分析フレ
ームの初めにリセットされる。The error comparison circuit 5 has a memory and a comparator for storing a candidate for the minimum value of the analysis error, compares the analysis error in the memory with the analysis error sent from the linear prediction analysis circuit 3, and determines that the latter is smaller. , The information in the memory is rewritten and the control circuit 1 is notified. Note that the memory is reset at the beginning of each analysis frame.

制御回路１は、声帯波候補設定回路６に声帯波の候補
の符号を発生させ、FIR係数メモリ７と分析結果バッフ
ァ４に送らせる。次に、FIRフィルタ８にFIR係数メモリ
７から送られた係数値をフィルタ係数にセットさせる。
続いて入力バッファ２に一分析フレーム分の信号をFIR
フィルタ８に送らせる。そこでフィルタリングされた信
号を線形予測分析回路３に分析させ、線形予測係数と分
析誤差をそれぞれ分析結果バッファ４と誤差比較回路５
に送らせる。誤差比較回路５から新しい分析誤差の方が
小さいことを通知してきたら、分析結果バッファ４に最
適値の更新を指示する、これを、様々な音源候補に対し
て繰り返す。The control circuit 1 causes the vocal fold wave candidate setting circuit 6 to generate a vocal fold wave candidate code, and sends it to the FIR coefficient memory 7 and the analysis result buffer 4. Next, the FIR filter 8 is caused to set the coefficient value sent from the FIR coefficient memory 7 as a filter coefficient.
Then, the signal for one analysis frame is input to the input buffer 2 by FIR.
Let the filter 8 send. Then, the filtered signal is analyzed by the linear prediction analysis circuit 3, and the linear prediction coefficient and the analysis error are analyzed by the analysis result buffer 4 and the error comparison circuit 5, respectively.
To send. When the error comparison circuit 5 notifies that the new analysis error is smaller, it instructs the analysis result buffer 4 to update the optimum value. This is repeated for various sound source candidates.

全ての声帯波の候補についての分析が終わったら、分
析結果バッファ４に対し第２のバッファに残ったデータ
を信号線10から出力させる。When the analysis of all the vocal fold wave candidates is completed, the analysis result buffer 4 outputs the data remaining in the second buffer from the signal line 10.

以上の動作が分析フレーム毎に繰り返される。 The above operation is repeated for each analysis frame.

次に、図面を用いて第２の発明についてその実施例を
示す第２図を参照しながら説明する。Next, the second invention will be described with reference to FIG.

図において、入力バッファ２、線形予測分析回路３、
分析結果バッファ４、誤差比較回路５、声帯波候補設定
回路６は前記第１の実施例と同様の動作をするものであ
る。In the figure, an input buffer 2, a linear prediction analysis circuit 3,
The analysis result buffer 4, error comparison circuit 5, and vocal cord candidate setting circuit 6 operate in the same manner as in the first embodiment.

IIR係数メモリ11には、声帯音源波の持つ周波数特性
の逆特性の最小位相成分の特性を持つIIRフィルタの第
１の係数と声帯音源波の持つ周波数特性の逆特性の最大
位相成分を逆位相化した特性を持つIIRフィルタの第２
の係数が記憶されている。この係数の値は、声帯波候補
設定回路６から送られる音声音源波の候補の符号と、制
御回路１から送られる第１の係数か第２の係数かを表す
制御信号に従って、IIRフィルタに送られる。The IIR coefficient memory 11 stores the first coefficient of the IIR filter having the characteristic of the minimum phase component of the inverse characteristic of the frequency characteristic of the vocal cord source wave and the maximum phase component of the inverse characteristic of the frequency characteristic of the vocal cord source wave in the opposite phase. IIR filter with generalized characteristics
Are stored. The value of this coefficient is transmitted to the IIR filter in accordance with the sign of the sound source wave candidate sent from the vocal cord candidate setting circuit 6 and the control signal indicating whether the signal is the first coefficient or the second coefficient sent from the control circuit 1. Can be

これらの係数は、予め以下のような処理によって作成
しておくことができる。まず、音声音源波の複素ケプス
トラムを求める。時間の原点の成分を１から引いた値で
置き換え、時間の原点以外の成分の符号を反転する。こ
うして得られた複素ケプストラムは音声音源波の特性の
逆特性に対応している。その正の時間の部分は最小位相
成分に相当する。これを実現するフィルタ係数は例え
ば、そのフーリエ変換からパワースペクトルを求め、そ
の逆フーリエ変換で自己相関関数を求め、これに線形予
測法を用いれば容易に得られる。この係数をIIRフィル
タの係数として用いれば、声帯音源波の逆特性の最小位
相成分を有するフィルタが実現できることになる。即
ち、これが第１の係数である。一方、声帯音源波の逆特
性の複素ケプストラムの負の時間の部分の時間軸を反転
して同様の処理を行なえば、第２の係数が求められる。These coefficients can be created in advance by the following processing. First, a complex cepstrum of a sound source wave is obtained. The component of the time origin is replaced with a value obtained by subtracting 1 from the value, and the signs of the components other than the time origin are inverted. The complex cepstrum obtained in this way corresponds to the inverse characteristic of the characteristic of the sound source wave. The positive time portion corresponds to the minimum phase component. For example, a filter coefficient for realizing this can be easily obtained by obtaining a power spectrum from the Fourier transform, obtaining an autocorrelation function by the inverse Fourier transform, and using a linear prediction method. If this coefficient is used as the coefficient of the IIR filter, a filter having the minimum phase component of the inverse characteristic of the vocal fold source wave can be realized. That is, this is the first coefficient. On the other hand, if the same processing is performed by inverting the time axis of the negative time portion of the complex cepstrum having the inverse characteristic of the vocal fold source wave, the second coefficient is obtained.

IIRフィルタ12は、入力バッファ２または時間軸反転
バッファ13から送られる信号をフィルタリングして時間
軸反転バッファ13に出力する。このフィルタへの入力信
号の切替は、制御回路１からの制御信号に従うスイッチ
14によって行なわれる。The IIR filter 12 filters a signal sent from the input buffer 2 or the time axis inversion buffer 13 and outputs the filtered signal to the time axis inversion buffer 13. Switching of an input signal to the filter is performed by a switch according to a control signal from the control circuit 1.
Done by 14.

時間軸反転バッファ13は、IIRフィルタ12から送られ
た信号を一時記憶し、時間を反転して出力する。この信
号は、制御回路１からの制御信号でスイッチ15を切り替
えることにより、IIRフィルタ12または線形予測分析回
路３に送られる。The time axis inversion buffer 13 temporarily stores the signal sent from the IIR filter 12, inverts the time, and outputs the inverted signal. This signal is sent to the IIR filter 12 or the linear prediction analysis circuit 3 by switching the switch 15 with a control signal from the control circuit 1.

制御回路１は、まず声帯波候補設定回路６に声帯波の
候補の符号を発生させ、IIR係数メモリ11と分析結果バ
ッファ４に送らせる。同時にIIR係数メモリ11には第１
の係数を出力するよう指示する。次に、IIRフィルタ12
にIIR係数メモリ11から送られた係数値をフイルタ係数
にセットさせる。続いてスイッチ14を入力バッファ２側
に切り替え、入力バッファ２に一分析フレーム分の信号
をIIRフィルタ12に送らせる。そこでフィルタリングさ
れた信号を時間軸反転バッファ13に送らせる。次に同一
の音源波の符号に対してIIR係数メモリ11には第２の係
数を出力するよう指示する。その係数をIIRフィルタ12
にセットさせ、スイッチ14およびスイッチ15を切り替え
て、時間軸反転バッファ13からIIRフィルタ12に信号が
流れるようにし、再びフィルタリングされた信号を時間
軸反転バッファ13に送らせる。次にスイッチ15を切り替
えて、時間軸反転バッファ13の出力信号を線形予測分析
回路３に送らせ、分析を行なわせる。The control circuit 1 first causes the vocal fold wave candidate setting circuit 6 to generate a vocal fold wave candidate code, and sends it to the IIR coefficient memory 11 and the analysis result buffer 4. At the same time, the IIR coefficient memory 11 has the first
Is output. Next, the IIR filter 12
Causes the coefficient value sent from the IIR coefficient memory 11 to be set as a filter coefficient. Subsequently, the switch 14 is switched to the input buffer 2 side, and the input buffer 2 sends a signal for one analysis frame to the IIR filter 12. The filtered signal is sent to the time axis inversion buffer 13. Next, it instructs the IIR coefficient memory 11 to output the second coefficient with respect to the code of the same excitation wave. The coefficient is converted to an IIR filter 12
, And switches the switches 14 and 15 so that the signal flows from the time axis inversion buffer 13 to the IIR filter 12, and the filtered signal is sent to the time axis inversion buffer 13 again. Next, the switch 15 is switched to send the output signal of the time axis inversion buffer 13 to the linear prediction analysis circuit 3 for analysis.

以後は、第１図に示す実施例と同様に各部を制御し
て、分析結果バッファ４に対し第２のバッファに残った
最適な分析結果のデータを信号線10から出力させる。Thereafter, each unit is controlled in the same manner as in the embodiment shown in FIG. 1, and the analysis result buffer 4 outputs the optimum analysis result data remaining in the second buffer from the signal line 10.

本実施列では、第１図に示す実施例と比べて、時間軸
反転バッファが余分に必要であるが、IIRフィルタを利
用できるため演算量は少なくて済む。In this embodiment, as compared with the embodiment shown in FIG. 1, an extra time axis inversion buffer is required, but the amount of calculation can be reduced because the IIR filter can be used.

（発明の効果）以上説明したように本発明は、音声音源の波形モデル
に基づいた分析でありながら、それらより少ない演算量
で実現できるうえ、分析が終了してから逆フィルタリン
グすれば位置情報も含んだ音源波形の推定が容易にでき
るという効果がある。(Effects of the Invention) As described above, the present invention is an analysis based on a waveform model of an audio sound source, but can be realized with a smaller amount of calculation. This has the effect that the included sound source waveform can be easily estimated.

[Brief description of the drawings]

第１図は本発明における第１の発明の一実施例を示すブ
ロック図、第２図は本発明における第２の発明の一実施
例を示すブロック図、第３図は従来方法を説明するため
のブロック図である。１……制御回路、２……入力バッファ、３……線形予測
分析回路、４……分析結果バッファ、５……誤差比較回
路、６……声帯波候補設定回路、７……FIR係数メモ
リ、８……FIRフィルタ、11……IIR係数メモリ、12……
IIRフィルタ、13……時間軸反転バッファ、14,15……ス
イッチ。FIG. 1 is a block diagram showing one embodiment of the first invention in the present invention, FIG. 2 is a block diagram showing one embodiment of the second invention in the present invention, and FIG. 3 is for explaining a conventional method. It is a block diagram of. 1 ... control circuit, 2 ... input buffer, 3 ... linear prediction analysis circuit, 4 ... analysis result buffer, 5 ... error comparison circuit, 6 ... vocal band candidate setting circuit, 7 ... FIR coefficient memory, 8 ... FIR filter, 11 ... IIR coefficient memory, 12 ...
IIR filter, 13: time axis inversion buffer, 14, 15, ... switch.

Claims

(57) [Claims]

A vocal vocal tract type voice analyzer for analyzing the characteristics of a vocal tract source wave and a vocal tract of a voice, comprising: an input buffer for temporarily storing an input voice signal; FIR filter having characteristics, coefficient generating means for generating coefficients of the FIR filter having the inverse characteristic of the frequency characteristics of various vocal source sound waves, spectrum analysis means for analyzing the spectrum of the output signal of the FIR filter, Error comparing means for comparing the analysis errors sequentially obtained by the spectrum analyzing means with each other, and comparing the error when the input voice signal is input to the FIR filter having the coefficients of the FIR filter for the various vocal source sound wave candidates. The candidate of the vocal cord source wave when the analysis error obtained by the means is minimized is taken as the analysis result of the vocal cord source wave, and the result of the spectrum analysis at that time is taken as the vocal tract. Sexual analysis as constituted by a control means for controlling to output the possible vocal cord vocal tract type speech analysis apparatus according to claim.

2. A vocal cord analyzing device of the type for analyzing the sound source wave of a voice and the characteristics of a vocal tract. An input buffer for temporarily storing an input voice signal, and an IIR having the buffer output as an input.
The first coefficient of the IIR filter, which has the characteristic of the minimum phase component of the inverse characteristic of the frequency characteristic possessed by the filter and the various vocal fold source waves, and the maximum phase component of the inverse characteristic of the frequency characteristic possessed by the vocal fold source wave are inverted. Coefficient generating means for generating a second coefficient of an IIR filter having characteristics; a time axis inversion buffer for temporarily storing an output of the IIR filter, inverting a time axis and outputting the inverted time axis to the IIR filter; Spectrum analysis means for analyzing the spectrum of the output signal; error comparison means for comparing the analysis errors sequentially obtained by the spectrum analysis means with each other; The IIR having a coefficient of 1
Invert the time axis of the signal filtered by the filter in the time axis inversion buffer, and then invert the time axis inverted signal of the signal whose time axis is inverted again by the IIR filter having the second coefficient. The time axis is inverted again in the buffer, and the analysis errors when the signals are analyzed by the spectrum analysis means are compared with each other. When the analysis error is minimized, the candidate of the vocal vocal source wave is analyzed for the vocal source wave. A vocal tract vocal tract type voice analysis device, comprising: a control unit for controlling the output of the result of the spectrum analysis at that time as a result of analysis of the vocal tract characteristics.