JPS62174798A

JPS62174798A - Voice analyzer

Info

Publication number: JPS62174798A
Application number: JP61227286A
Authority: JP
Inventors: 祐輔塚原; 益田　斉; 山口　幹郎; 昌男田邊
Original assignee: Toppan Printing Co Ltd
Current assignee: Toppan Inc
Priority date: 1985-10-16
Filing date: 1986-09-26
Publication date: 1987-07-31

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】［発明の目的］（所業上の利用分野）本発明は、入力音声信号のスぜクトル包絡に基づいて、
当該音声信号を分析する音声分析装置に関する。[Detailed Description of the Invention] [Object of the Invention] (Field of Commercial Use) The present invention provides the following method based on the spectrum envelope of an input audio signal.
The present invention relates to a voice analysis device that analyzes the voice signal.

（従来の技術）音声認識装置、聴覚障害者の友めの発声練習装置、音声
分析合成による通信システムあるいは音声合成装置など
においては、目的とする処理全実現する念めに、入力音
声信号の分析及びその特徴抽出が必要となる。入力音声
信号の分析は、一般に、その周波数スペクトルに基づい
てなされる。(Prior art) In speech recognition devices, vocal training devices for friends of hearing-impaired people, communication systems or speech synthesis devices using speech analysis and synthesis, input speech signals are analyzed in order to fully realize the intended processing. and its feature extraction is required. Analysis of an input audio signal is generally done based on its frequency spectrum.

それは、人間の聴覚が音声信号の時間変化波形そのもの
エリ、むしろ音声信号のスペクトルに対して敏感であり
、同一のスペクトル形状を持った信号を同じ音韻として
認識するという事実によっている。This is due to the fact that human hearing is sensitive to the time-varying waveform of the audio signal itself, or rather to the spectrum of the audio signal, and recognizes signals with the same spectral shape as the same phoneme.

音声信号は、有声音部分においては、声帯振動によって
駆動される周期的信号としての構造を持つ。その結果、
有声音部分の周波数スペクトルは、線スペクトル的構造
を持つ。一方、無音声部分においては、音声信号は声帯
振動をともなわず、むしろ、声道を通過する気流によっ
て起こされる雑音全その音源とし又いる。その結果、こ
の無Ｍ　？５部分の周波数スペクトルは、線スシクトル
のような周期的な構造を持たない。これに対応して、従
来の音声分析においては、入力音声信号の音源として周
期的ノルス発生源を想定する方法と、ノイズ音源全想定
する方法がある。前者は自己回帰モデル（ＡＲモデル）
による音声分析で良く知られ、後者はケプストラム分析
による音声分析で良く知られ℃いる。これらの音声分析
によれば、入力音声信号のスペクトルから微細な構造を
取り除い友いわゆるスペクトル包絡が得られる。The voiced part of the audio signal has a structure as a periodic signal driven by vocal fold vibration. the result,
The frequency spectrum of the voiced part has a line spectral structure. On the other hand, in the silent portion, the voice signal is not accompanied by vocal fold vibration, but rather the noise caused by the airflow passing through the vocal tract is the source of the noise. As a result, this no M? The frequency spectrum of the five parts does not have a periodic structure like a line sushictor. Correspondingly, in conventional speech analysis, there are two methods: a method that assumes a periodic Norse source as the sound source of the input audio signal, and a method that assumes all noise sources. The former is an autoregressive model (AR model)
The latter is well known for its speech analysis using cepstral analysis. These audio analyzes remove fine structures from the spectrum of the input audio signal to obtain a so-called spectral envelope.

入力音声信号を、上記のＡＲモデルによる分析方法ある
いはケプストラム分析方法などの方法で分析し、スペク
トル包絡全求める場合、これらの方法は系の時間的定常
性を仮定しているため、本来的には、時間的に音韻が変
化する場合には、適用できないものである。そこで、こ
れらの分析方法においては、系が大きく変化しないと見
られるような短かい時間領域の信号？切り出し、これに
端点の影響が出ないようにハミング窓、ハニング閂など
の窓関数をかけることで、時間的に準定常的な信号を作
り出している。そして、この信号全分析することによっ
て得られるスペクトル包絡を、上記信号切り出し時点で
のスペクトル包絡としている。When analyzing an input audio signal using a method such as the AR model analysis method or cepstral analysis method described above to obtain the total spectral envelope, these methods assume the temporal stationarity of the system, so essentially , cannot be applied when the phoneme changes over time. Therefore, in these analysis methods, signals in a short time domain where the system does not appear to change significantly are used. A temporally quasi-stationary signal is created by cutting out the signal and applying a window function such as a Hamming window or Hanning bar to this to avoid the influence of end points. The spectral envelope obtained by completely analyzing this signal is taken as the spectral envelope at the time of signal extraction.

なお、上記のスペクトル包絡の時系列を得る分υ丁方法
のほか、モデルの前提として、系の時間的な変化全仮定
した非定常な音声信号の分析方法も提案され１いる。し
かし、この分析方法において牛２、結果としてスペクト
ル包絡の時系列が得られることに変りはない。In addition to the above-mentioned method for obtaining a time series of spectral envelopes, a method for analyzing unsteady audio signals that assumes all temporal changes in the system as a premise of the model has also been proposed. However, in this analysis method, a time series of spectral envelopes can still be obtained as a result.

さらに上記の分析方法以外の有力な分析方法とＦ７て、
フィルタパンクによる周波数分析方法がある。この分析
方法は、入力音声信号全多数のそれぞれ異なる中心周波
数金持つバンド・ぐスフィルタＩＩこ通し、それらフィ
ルタ出力金もっ℃スペクトル雅度とするものである。こ
の分析方法の特徴は、例えば、ハードウェア化ＶＣＬる
失時団処理の各易さ　番′こ　あ　る　。Furthermore, with the powerful analytical methods other than the above analytical methods and F7,
There is a frequency analysis method using filter puncture. This analysis method involves passing a total number of input audio signals through band filters II, each having a different center frequency, and determining the spectral quality of the filter outputs. The characteristics of this analysis method include, for example, the ease with which a hardware-based VCL can process lost time groups.

このように、音声分析においては、入力音声信号のスペ
クトル包絡を求めることが多いが、求め念スペクトル包
絡から最終的に、音声信号を分析する方法としては、そ
の局所的ピーク（以下、ローカルピークと記す）からの
フォルマントの周波数及び幅など全抽出することによっ
て分析するフォルマント分析が知られ工いる。この分析
方法は、母音部がそれぞれ固有のフォルマント周波数及
びフォルマント幅を持ち、かつ子音部では、後続する母
音部へ向ってのフォルマント周波数の変化の様子に各子
音の特徴があるという事実に基づくものである。例えば
、日本語の５母音（アイウェオ）は、周波数の低い側か
ら２つのフォルマント周波数Ｆｌ　＋　Ｆ２で特徴づけ
られ、同性で同年代の人物の音声では、ＦＩ＋Ｆ２はそ
れぞれほぼ同じ直となる。In this way, in speech analysis, the spectral envelope of an input speech signal is often determined, but the final method of analyzing the speech signal from the desired spectral envelope is to calculate its local peaks (hereinafter referred to as local peaks). Formant analysis is known in which the frequency and width of formants are all extracted from the formant. This analysis method is based on the fact that each vowel part has its own formant frequency and formant width, and in the consonant part, each consonant is characterized by the way the formant frequency changes toward the following vowel part. It is. For example, the five Japanese vowels (aiweo) are characterized by two formant frequencies Fl + F2 from the lowest frequency side, and in the voices of people of the same gender and age, FI + F2 have almost the same directness.

したがって、フォルマント周波数ＦＩ＋Ｆ２を検出する
ことによって、母音全識別することが可能である。Therefore, by detecting the formant frequency FI+F2, it is possible to identify all vowels.

また、フすルマントにこだわらず、スぜクトル包絡ノロ
ーカルーーりを抽出し、その周波数と時間約な遷移に注
目するローカルビークの分析も知られ℃いる。この分析
方法は、母音の変化部分や子音部分では、ローカルビー
クの時間的変化に、音韻の特徴が現われ℃いると考えら
れる点に基づいている。In addition, local peak analysis is also known, which extracts the spectrum envelope nolocale and focuses on its frequency and temporal transition, without focusing on full mants. This analysis method is based on the fact that phonological features are thought to appear in temporal changes in local peaks in vowel changes and consonant parts.

その他、スペクトル包絡の曲線そのものをもって音声信
号の特＠量とし、以降の処理で識別、分類あるいは表示
に用いる方法も提示され℃いる。In addition, a method has been proposed in which the spectral envelope curve itself is used as a characteristic quantity of the audio signal and used for identification, classification, or display in subsequent processing.

以上述べたように、音声信号の分析においては、スペク
トルの包絡を抽出する事が重要であり、さらに、そのス
ペクトル包絡自体以外にも、そこから得られる７すルマ
ントの周波数や幅、さらには、ローカルピークの周波数
とその遷移なども音声を特徴づける量として用いられる
。As mentioned above, in the analysis of audio signals, it is important to extract the spectral envelope, and in addition to the spectral envelope itself, the frequency and width of the seven summants obtained from it, as well as the Local peak frequencies and their transitions are also used as quantities that characterize speech.

［発明が解決しようとする問題点］ところで、人間が音声を発する場合、その音韻は声道の
共振、反共振特性によっ℃形成されると考えられている
。例えば、共振周波数はスペクトル包絡上にフォルマン
トとし℃現われる。し友がっ℃、声道構造がほぼ同一の
人間同志では、同一の音韻に対しては、はぼ同様なスペ
クトルが得られる。[Problems to be Solved by the Invention] By the way, when a human utters a voice, the phonology is thought to be formed by the resonance and anti-resonance characteristics of the vocal tract. For example, the resonant frequency appears as a formant on the spectrum envelope. Human beings with almost the same vocal tract structure can obtain similar spectra for the same phoneme.

ところが、男性と女性、小供と大人のように、声道長が
著しく異なる場合には、共振あるいは反共振の周波数が
ずれるため、同一の音韻に対するスペクトル包絡の形が
一致しないことが知られている。したがって、この場合
は、ローカルーーりや７オルマントの周波数もずれるこ
とになる。このことは、不特定話者の音声認識や聴覚障
害者用の音声の視覚的表示など、発声者によらず、同一
音韻に対しては、同一の結果を抽出することを目的とじ
九分析には、著しく不都合である。However, it is known that when the vocal tract lengths are significantly different, such as between men and women, children and adults, the resonance or antiresonance frequencies shift, and the shapes of the spectral envelopes for the same phoneme do not match. There is. Therefore, in this case, the local and 7-ormant frequencies will also be shifted. This means that the purpose of analysis is to extract the same results for the same phoneme regardless of the speaker, such as speech recognition for unspecified speakers or visual display of speech for the hearing impaired. is extremely inconvenient.

このような問題を解決する方法としては、従来、多数の
標準・母ターンを用意する方法と、フォルマント周波数
の比をとる方法が知られている。Conventionally known methods for solving this problem include a method of preparing a large number of standard/mother turns, and a method of taking a ratio of formant frequencies.

前者は、男性、女性、大人、小供などの多数の異なっ友
人間のスペクトル包絡を標準パターンとして登録してお
き、未知の入力・臂ターンを、それら多数の標準パター
ン中の最も類似のものに分類することによって、不特定
多数の入力音声を認識しようとするものである。しかし
、この方法では、任意の人間の入力音声に対応するため
には、非常に多数の標準・平ターンを用意しておかなけ
ればならず、ま念、それらのパターンの比較に長時間を
要する欠点がある。ま几、この方法は、声道長を規格化
しｔ結果全抽出するわけではないので、声道長によらな
い音韻の特徴ヲ衣示する目的では使用できない。In the former method, the spectral envelopes of many different friends, such as men, women, adults, and children, are registered as standard patterns, and unknown inputs and arm turns are matched to the most similar one among these many standard patterns. It attempts to recognize an unspecified number of input voices by classifying them. However, with this method, in order to correspond to any human input voice, it is necessary to prepare a very large number of standard/flat turns, and it takes a long time to compare these patterns. There are drawbacks. However, since this method does not standardize vocal tract length and extract all results, it cannot be used for the purpose of showing phoneme characteristics that are not dependent on vocal tract length.

後者のフォルマント周波数の比をとる方法は、声道長に
よらない母韻の特＠を抽出する方法としてよく知られて
いる。この方法について、さらに説明を付は加えれば、
まず、スペクトル包絡中のローカルピークのうち、母音
につい℃比較的安定と考えられている第１．第２．第３
のフォルマントの周波数Ｆ＋　ｌＦ２１Ｆ３　’に抽出
し、それらの間の比、例えば、Ｆ＋　／Ｆｓ　ｒ　Ｆ２
　／　Ｆｓを求め℃、特徴量とするものである。声道長
がａ倍になれば、フォルマント周波数は１７　ａ倍、つ
まり、Ｆｌ　／　＆　、　Ｆ２／　ａ　ｙ　Ｆ　３　／
　ａになるが、それらの比は不変であるというのが、こ
の方法の根拠である。The latter method of taking the ratio of formant frequencies is well known as a method for extracting the characteristic @ of vowels independent of vocal tract length. For further explanation of this method,
First, among the local peaks in the spectrum envelope, the first peak is considered to be relatively stable for vowels. Second. Third
extract the frequency of the formant F+ lF21F3' and the ratio between them, e.g., F+ /Fs r F2
/Fs is calculated and used as a feature quantity. If the vocal tract length increases by a times, the formant frequency increases by 17 times, that is, Fl / &, F2/ a y F 3 /
The basis of this method is that the ratio between the two is unchanged.

この方法は、母音について、第１．第２．第３のフすル
マントが安定に抽出される場合は、良い結果をも几らす
が、安定に抽出されない場合は、分析結果の信頼性が著
しく低下するという問題を有する。ま几、この方法は、
子音部分には適用できないという欠点がある。つまり、
子音部分では、声道の共振特性であるフォルマントは定
義されず、実際、スペクトル包絡上にも、第１．第２．
第３のフォルマントに対応するローカルーーりが観察さ
れるとはかぎらないので、Ｆｌ　ｒ　Ｆ２　＊　Ｆ３を
抽出して比をとることができないのである。ま念、子音
部分に限らず母音でも、立上り部分や終了部分では、フ
すルマントが必ずしも安定ではないので、誤っ之７オル
マント周波数が抽出されることがある。このような場合
は、フすルマント周波数の比が不連続に変化し、全く誤
った値をとってしまう。This method uses the first . Second. If the third fulmant is extracted stably, good results can be obtained, but if it is not stably extracted, there is a problem in that the reliability of the analysis results is significantly reduced. Well, this method is
The disadvantage is that it cannot be applied to consonant parts. In other words,
In the consonant part, the formant, which is the resonance characteristic of the vocal tract, is not defined, and in fact, the first . Second.
Since the local value corresponding to the third formant is not always observed, it is not possible to extract Fl r F2 * F3 and calculate the ratio. Please note that the full mant is not necessarily stable not only in the consonant part but also in the vowel part at the rising part and ending part, so the wrong 7 ormant frequency may be extracted. In such a case, the ratio of the fullant frequencies changes discontinuously and takes a completely incorrect value.

し友がって、この方法は、音声信号中の母音部分のうち
、安定な部分にしか用いることができず、母音部分の始
めや終りの部分及び子音部分は別の方法を用いて分析し
なくてはならない。しかし、その場合、母音安定部分と
その他の部分では、抽出パラメータが異なることになる
之め、子音部分から母音部分への連続した変化を記述す
ることができない。つまり、フォルマント周波数の比を
とる方法は、本質的に、定常的な母音にだけしか適用で
きないものである。However, this method can only be used for stable parts of the vowel part in the speech signal, and the beginning and end of the vowel part and the consonant part must be analyzed using another method. Must-have. However, in that case, the extraction parameters are different between the stable vowel part and other parts, so it is not possible to describe the continuous change from the consonant part to the vowel part. In other words, the method of calculating the ratio of formant frequencies is essentially applicable only to stationary vowels.

以上述べたように、声道長の異なる不特定多数の声道全
般のスペクトル包絡から、音韻に固有の特徴量を抽出す
る方法は、いまだ見い出されていない。As described above, a method for extracting phoneme-specific features from the spectral envelope of an unspecified number of vocal tracts with different vocal tract lengths has not yet been found.

この発明の目的は、発声者の性別、年齢などによる声道
長の違いに左右されることなく、音韻に特有のスペクト
ルを得ることができる音声分析装置を提供するところに
ある。An object of the present invention is to provide a speech analysis device that can obtain spectra specific to phonemes without being affected by differences in vocal tract length due to the gender, age, etc. of the speaker.

［発明の構成］（問題点を解決する友めの手段及び作用）上記目的を達
成するためにこの発明は、入力音声信号から抽出したス
ペクトル包絡の対数化及び正規化を図り、これによって
得られ友正規化対数スペクトル包絡を、対数目盛の周波
数上で積分し、上記スペクトル包絡の包絡情報を周波数
上から上記積分出力上に射影するように構成したもので
ある。[Structure of the invention] (Means and effects for solving the problem) In order to achieve the above object, the present invention logarithms and normalizes the spectral envelope extracted from the input audio signal. The normalized logarithmic spectral envelope is integrated over the frequency of the logarithmic scale, and the envelope information of the spectral envelope is projected from the frequency onto the integral output.

（実施例）以下、添付図面を参照して、この発明の一実施例による
音声分析装置につい℃詳細に説明する。(Embodiment) Hereinafter, a speech analysis device according to an embodiment of the present invention will be described in detail with reference to the accompanying drawings.

第１図は一実施例の構成を示すブロック図であるが、こ
の第１図を説明する前に、まず、第２図乃至第６図を参
照しながら、一実施例の原理を説明する。FIG. 1 is a block diagram showing the configuration of an embodiment. Before explaining FIG. 1, the principle of the embodiment will first be explained with reference to FIGS. 2 to 6.

まず、声道長による母音のスペクトル包絡全比較すると
第２図のようになる。第２図は、異なる２つの声道長（
ｔ＋　）　−（ｔｔ　）における同一音韻のスペクトル
包絡Ｐ（、？”）を対数安示したものである。First, a complete comparison of the spectral envelopes of vowels according to vocal tract length is shown in Figure 2. Figure 2 shows two different vocal tract lengths (
It is the logarithmic expression of the spectral envelope P(,?'') of the same phoneme at t+)-(tt).

この第２図によれば、数１００　Ｈｚから約５　ｋＨｚ
程度の周波数範囲においては、長い声道長（ｔｌ）のス
ペクトル包絡Ｐ１（ｆ）（第２図では、ｌｏｇＰｔ（ｆ
））は、短い声道長（Ｌｍ　）のスペクトル包絡Ｐ２（
ｆ）（第２図では、ｌｏｇＰｇ　（ｆ）　　）に対して
、おおむね同一形状のスペクトル包絡Ｐ（ｆ）を周波数
軸Ｃｆ）方向に、原点を固定して定数倍した形になって
いる。これに反して、ＯＨｚから１００Ｈｚおよび５　
ｋＨｚ程度以上の周波数範囲では、違いが大きく現われ
、スペクトル包絡ＰＣｆ）の相似性は小さくなる。しか
し、この部分は個人の音質の差に関わる部分であって、
音声分析では、それほど重要ではない。従って、声道長
の長さは共振周波数に比例するので声道長（ｔｔ　）　
、（ｔｔ　）の比（ｔｔ／ｌｓ）をｒとすれば、スペク
トル包絡Ｐ＋（ｆ）とスペクトル包絡Ｐｇ（ｆ）との間
には、大きさを正規化して対数をとっ之場合、数１００
　Ｈｚから５　ｋＨｚ程度においては、なる関係が成り立つ。ここでスペクトル包絡Ｐｌ（ｆ）
、Ｐｚ　（ｆ）そのものではなく、大きさを正規号の振
幅の大きさの違いを取り除く念めである。According to this figure 2, the frequency ranges from several 100 Hz to about 5 kHz.
In a frequency range of about
)) is the spectral envelope P2(
f) (in FIG. 2, logPg (f)) is multiplied by a constant, with the origin fixed, in the frequency axis Cf direction of the spectrum envelope P(f), which has approximately the same shape. On the other hand, from OHz to 100Hz and 5
In a frequency range of approximately kHz or higher, the difference appears significantly, and the similarity of the spectral envelopes PCf) becomes small. However, this part is related to individual differences in sound quality,
In speech analysis, it is less important. Therefore, since the vocal tract length is proportional to the resonance frequency, the vocal tract length (tt)
, (tt), the ratio (tt/ls) of spectral envelope P+(f) and spectral envelope Pg(f) is expressed by the number 100 when the magnitude is normalized and the logarithm is taken.
From Hz to about 5 kHz, the following relationship holds true. Here, the spectral envelope Pl(f)
, Pz (f) itself, but the magnitude is intended to eliminate the difference in the amplitude of the regular sign.

さて、このとき、第１〜第３の７オルマントを抽出すれ
ば、その周波数（Ｆｌ　）−（Ｆ２　）、　（Ｆ３　）
及び（Ｆ＋’）　、（Ｆｊ’）　ｅ　（Ｆｓ’）　＃ｉ
第２図のようになる。この場合、これらの間には、 ’１’／Ｆ１　〜　Ｆｘ’／’ｔ　　−Ｆｓ’／Ｆｓ　
　＝　　ｒ　　　　　　−（２）の関係があるため、７
オルマント周波数ＣＦ）の比（例えば、次式（３）に示
されるようなもの）は不変となる。Now, at this time, if the first to third 7-ormants are extracted, their frequencies (Fl) - (F2), (F3)
and (F+') , (Fj') e (Fs') #i
It will look like Figure 2. In this case, between these, '1'/F1 ~ Fx'/'t -Fs'/Fs
Since there is a relationship of = r − (2), 7
The ratio (for example, as shown in the following equation (3)) of the ormant frequency CF remains unchanged.

ＦＨ／Ｆ２　　さ　Ｆ１’／’Ｆ’＠’　　、　　　Ｆ
ｌ　　／Ｆ３　３＝　ｐ　＋　’／　ｐ’ｓ　’・・・
（３）これを第３図の実施例及び第４図の変換例に従って説明
する。第３図は、日本語の５母音につい℃、２０〜３０
代の男性と女性のＦｌと２２の分布を示し念ものである
。この第３図から明らかなように、男性と女性では、分
布が大きくずれ℃おり、几とえば、男性のアと女性のオ
、男性の工と女性のつが同一範囲に分布する。FH/F2 SA F1'/'F'@', F
l/F3 3=p+'/p's'...
(3) This will be explained according to the embodiment shown in FIG. 3 and the conversion example shown in FIG. Figure 3 shows the temperature range of 20 to 30℃ for the five Japanese vowels.
This is an illustration of the distribution of Fl and 22 among men and women in the 20s. As is clear from Figure 3, the distributions for men and women are significantly different.For example, A for men and O for women, and A for men and O for women are distributed in the same range.

これに対して、第４図はＦ１／　Ｆ　ｓ　＋　Ｆ　ｔ　
／Ｆ　ｓの分布を表わしたものである。この第４図によ
れば、フォルマント周波数の比においては、男性と女性
の差が解消していることがわかる。On the other hand, in FIG. 4, F1/F s + F t
/Fs distribution. According to FIG. 4, it can be seen that the difference between men and women has disappeared in the ratio of formant frequencies.

そこで、母音のスペクトル包絡が定常状態になｋＨｚ程
度の範囲におい１周波数軸上の値を定数倍する。すなわ
ち、ｆ　ｋ　ｒ−ｆとする次式（４）で示される変換（
Ｒ）ｔスペクトル包絡Ｐ（ｆ）に対して行う。Therefore, the value on one frequency axis is multiplied by a constant within a range of approximately kHz, where the spectral envelope of the vowel is in a steady state. That is, the transformation (
R) Perform for t spectral envelope P(f).

ｐ（ｆ）→ｐ’（ｆ）＝Ｐ（ｒ−ｆ）　　　　　　−（
４）このとき、不変な関数空間にスペクトル包絡ＰＣｆ
）とＰ（ｒ−、ｆ）　′ｔ−射影する変換（ｔＪ）　ｔ
−見い出すことができれば、その空間中では、声道長＜
１）によらず、同一の音韻に属するスペクトル包絡ＰＣ
ｆ）　　は、同一の形状を持つはずである。p(f)→p'(f)=P(r-f) −(
4) At this time, the spectral envelope PCf in the invariant function space
) and P(r-, f) 't-projecting transformation (tJ) t
-If you can find it, in that space the vocal tract length <
Regardless of 1), spectral envelope PC belonging to the same phoneme
f) should have the same shape.

これを概念的に示したのが第５図である。この第５図は
、声道長＜１＞の違いによる日本語の音韻“ア”あるい
は“イ”のスペクトル包絡Ｐ（ｆ）の違いが、変換（Ｕ
）によりて同一形状のスペクトル包絡Ｐ’　（ｆ　）に
変換される様子を示すものである。すなわち、日本語の
音韻１７″に関しては、声道長（ｔｌ）におけるスペク
トル包絡ＰｌアＣｆ＞と声道長（ｔ２）におけるスペク
トル包絡Ｐ２アＣｆ）が、変換（Ｕ）により、同一形状
のスペクトル包絡Ｐ′アＣｆ）に変換される様子を示す
。同様に、日本語の音韻イに関しては、スペクトル包絡
Ｐ１イＣｆ）と、Ｐ２イＣｆ）が、スペクトル包絡Ｐ′
イＣｆ）に変換される様子を示す。FIG. 5 conceptually shows this. Figure 5 shows that the difference in the spectral envelope P(f) of the Japanese phoneme “a” or “i” due to the difference in vocal tract length <1> is due to the transformation (U
) into a spectral envelope P'(f) of the same shape. That is, regarding the Japanese phoneme 17'', the spectral envelope PlACf> at the vocal tract length (tl) and the spectral envelope P2ACf) at the vocal tract length (t2) are transformed into spectra with the same shape by transformation (U). The figure shows how the envelope P'ACf) is converted into the spectral envelope P'ACf).Similarly, regarding the Japanese phonetic A, the spectral envelopes P1ICf) and P2ICf) are converted into the spectral envelope P'
This shows how it is converted to (Cf).

この実施例では、上記変換（Ｕ）’を次のように実現し
ている。すなわち、大きさを正規化し之対数スペクトル
包絡を周波数軸の対数目盛上で積分したものｔ−Ｌｃｆ
＞とすると、となる。ここで、εはＯに近い十分小さな正の値であり
、後で述べる条件で決められる。In this embodiment, the above conversion (U)' is realized as follows. In other words, the magnitude is normalized and the logarithmic spectrum envelope is integrated on the logarithmic scale of the frequency axis, t-Lcf
＞ Then, it becomes . Here, ε is a sufficiently small positive value close to O, and is determined by the conditions described later.

式（５）のＬ（ｊ’）はＰ（ｆ）の関数形に依存してい
るので、これｅＬ、（ｆ）と曹く。これに、式（４）の
変換を施すと、ここで、ｋ＝＝７であり、ｌｏｇｋ　＝　ｌｏｇ　ｈ　
−ｌｏｇ　ｒこれより、ｄ　ｌｏｇ　ｋ　＝　ｄ　ｌｏ
ｇ　ｈ−ｄ　ｌｏｇ　ｒ　　であるが、ｒは定数である
のでｄｌｏｇｋ＝ｄｌｏｇｈとなる。Since L(j') in equation (5) depends on the functional form of P(f), it can be written as eL,(f). If we apply the transformation of equation (4) to this, we get k==7, and logk=log h
−log rFrom this, d log k = d lo
g h−d log r However, since r is a constant, dlogk=dlogh.

・・・（６）となる。この式（６）の右辺第２項が十分小さければ、
ＬＰ’（ｆ）　＃　Ｌ、　（ｒｆ＞　　　　　　　　　
−（７）が成り立つ。そこで、スペクトル包絡ＰＣｆ）
とこのり、Ｃｆ）とを、周波数Ｃｆ）で・中ラメータ表
示し九関数（ｐ（ｆ）、ｔ、、（ｆ））を考えると、（
ｐ（ｆ）、Ｌ、（ｆ））＝　（Ｐ　（ｒ　ｆ　）　−Ｌｐ　（ｒ　ｆ　）　）＝
　（Ｐ’（ｆ）　、Ｌ、’（ｆ）　）　　　　　　　　
・・・（８）となり、（Ｐ（ｆ）、ｆ）→（ｐ（ｆ）、Ｌ、（ｆ））　　・・
・（９）の変換（Ｕ）が、式（４）の変換（Ｒ）に対し
て、不変な関数空間への射影であることがわかる１、周
波数Ｏの近傍から周波数軸方向に比例的に伸縮されてい
るとすると、対数周波数軸金穴（５）による積分ＬＣｆ
）に置換することにより、正規化対数スペクトル包絡の
周波数軸上でのズレを吸収するためである。...(6) becomes. If the second term on the right side of equation (6) is sufficiently small, then
LP'(f) #L, (rf>
−(7) holds true. Therefore, the spectral envelope PCf)
, Cf) is represented by the frequency Cf) in a medium-sized lameter, and considering the nine functions (p(f), t, , (f)), (
p(f), L, (f)) = (P(rf)-Lp(rf))=
(P'(f), L,'(f))
...(8), (P(f), f) → (p(f), L, (f))...
・It can be seen that the transformation (U) in (9) is a projection onto the function space that is invariant with respect to the transformation (R) in Equation (4).1, proportionally in the frequency axis direction from the vicinity of frequency O If it is expanded or contracted, the integral LCf by the logarithmic frequency axis gold hole (5)
) to absorb the deviation of the normalized logarithmic spectrum envelope on the frequency axis.

第６図は、変換（Ｕ）の概念図である。なお、第６図は
、後述する包絡情報とじ１、スペクトル包絡ＰＣｆ）で
はなく、対数スペクトル包絡ｌｏｇｌＰ（ｊ”）ｌｅ示
している。この場合、上式（８）は次式Ｑ１のように表
現される。FIG. 6 is a conceptual diagram of conversion (U). Note that FIG. 6 shows the logarithmic spectral envelope loglP(j'')le instead of the envelope information binding 1 and the spectral envelope PCf), which will be described later.In this case, the above equation (8) can be expressed as the following equation Q1. be done.

（ｌｏｇＩＰ（ｆ）Ｉ、Ｌ、（ｆ））＝（ｌｏｇＩＰ’（ｆ）ｌ、ＬＰ’（ｆ））　　　・・
・００また、スペクトル包絡Ｐ（ｆ）や対数スぜクトル
包絡１ｏｇＩＰ（ｆ）ｌでになく、正規化対数スペクト
ル包絡ｌｏｇ　ｌｐ　（ｆ）　Ｉ　金剛いるとすれば、
式（８）は次式αめのように表現される。(logIP(f)I,L,(f)) =(logIP'(f)l,LP'(f))...
・00 Also, if we have not the spectral envelope P(f) or the logarithmic spectrum envelope 1ogIP(f)l, but the normalized logarithmic spectral envelope loglp(f) I Kongo,
Equation (8) is expressed as the following equation α.

（ｌｏｇＪＰ（ｆ）ｌ　、Ｌｐ（ｆ））ここで、式（６
）の右辺第２項を無視できる条件につい℃説明する。こ
の条件は、上記声道長１）の比（ｒ）の実際的な範囲が
、捧から２程度と考えられ、また１周波数軸上のεから
２εの範囲において、正規化対数スペクトル包絡はほぼ
一定、すなわち、近似的に定数であると考えられるので
、次式（２）で表される積分（Ｉ）ｆｆｉ評価すること
により求まる０〜ｌｏｇ　ｌ　Ｐ　（ε）・１ｏｇ２　　　　　　・・
・（６）音声スペクトル包絡は、ピッチ周波数の半分以
下の周波数では、大幅に値が低下するので、式（６）の
εとして１００１（Ｚ程度をとれば、実効的には、式（
６）の右辺第２項を無視できることが、式（６）かられ
かる。逆に、このεを小さくとりすぎると、式（５）で
示される積分ＬＣｆ）に対し℃、周波数の小さい部分か
らの寄与が大きくなりすぎ、スペクトルの原点近傍の形
状に敏感になる。したがって５ｇの大きさとしては、１
０Ｈｚ以下は不適当である。以上からεの大きさとし℃
は、１０　Ｈｚ〜１００Ｔ（ｚ程度が適当である。(logJP(f)l, Lp(f)) Here, equation (6
The conditions under which the second term on the right side of ) can be ignored are explained below. This condition is based on the fact that the practical range of the ratio (r) of the vocal tract length 1) is considered to be about 2 to 2, and the normalized logarithmic spectral envelope is approximately in the range from ε to 2ε on one frequency axis. Since it is considered to be constant, that is, approximately a constant, 0 ~ log l P (ε)・1og2 . . . is found by evaluating the integral (I)ffi expressed by the following equation (2).
・(6) The value of the audio spectrum envelope significantly decreases at frequencies less than half the pitch frequency, so if ε in equation (6) is taken to be about 1001 (Z), effectively the equation (
It can be seen from equation (6) that the second term on the right side of equation (6) can be ignored. On the other hand, if ε is set too small, the contribution from the low temperature and low frequency portions to the integral LCf shown by equation (5) becomes too large, making it sensitive to the shape of the spectrum near its origin. Therefore, the size of 5g is 1
Below 0Hz is inappropriate. From the above, the size of ε is determined as °C
is approximately 10 Hz to 100 T (approx.

以上、本実施例の概要を説明し念が、次に第１図に戻っ
て１本実施例の処理を行う几めの構成について説明する
。Having explained the outline of this embodiment above, we will now return to FIG. 1 and explain the detailed structure for carrying out the processing of this embodiment.

第１図において、スペクトル包絡抽出部１１は入力音声
信号（Ａ！Ｎ）のスペクトル包絡ＰＣｆ）’ｌ：抽出す
る。ここで、スペクトル包絡を抽出する方法としては、
人Ｒモデルによる音声分析方法におけるスペクトル包絡
抽出方法、ケグヌトラム分析による音声分析方法におけ
るスペクトル包絡抽出方法、フィルタパンクによる周波
数分析による音声分析方法におけるスペクトル包絡抽出
方法など、種々様々な方ｆｉ：を用いることが可能であ
る。In FIG. 1, the spectral envelope extraction unit 11 extracts the spectral envelope PCf)'l: of the input audio signal (A!N). Here, the method for extracting the spectral envelope is as follows:
A variety of methods can be used, such as a spectral envelope extraction method in a speech analysis method using the human R model, a spectral envelope extraction method in a speech analysis method using Kegnutrum analysis, and a spectral envelope extraction method in a speech analysis method using frequency analysis using filter puncture. is possible.

対数化部１２はスペクトル包絡抽出部１１で抽出された
スペクトル包絡Ｐ（、Ｉ”）の大きさを対数化する正規
化部１３は、この対数化部１２から出力される対数スペ
クトル包絡１ｏｇｌＰ（ｆ）ｌの大きさを正規化する。The logarithmization unit 12 logarithms the magnitude of the spectral envelope P(,I”) extracted by the spectral envelope extraction unit 11. The normalization unit 13 logarithms the magnitude of the spectral envelope 1oglP(f ) normalize the size of l.

対数化部１２と正規化部１３を合せて変換部１０ｆ形成
する。ここで、対数スペクトル包絡１ｏｇｌＰ（ｆ）Ｉ
の大きさを正規化する方法とし又は、例えば、自動利得
制御による方法、対数スペクトル包絡１ｏｇｌＰ（ｆ）
ｌを周波数（ｆ）で微分し℃定数項を落とし念後、積分
し℃一定値を加える方法などが挙げられる。The logarithmization section 12 and the normalization section 13 together form a conversion section 10f. Here, the logarithmic spectral envelope 1oglP(f)I
or, for example, by automatic gain control, the logarithmic spectral envelope 1oglP(f)
Examples include a method of differentiating l with respect to frequency (f), dropping the °C constant term, integrating it, and adding a constant °C value.

積分部１４は正規化部１３から出力される正規化された
対数スペクトル包絡１ｏｇＩＰ（ｆ）Ｉ？、対数目盛上
の周波数を変数とし′″ＣＣ潰分。つまり、この積分部
１４は、スペクトル包絡１０ｇ１Ｐ（ｆ）１を先の式（
５）に示される積分関数に従り℃、積分するものである
。ここでεの値とし″′Ｃ３０Ｈｚｉ使用する。The integrating unit 14 receives the normalized logarithmic spectrum envelope 1ogIP(f)I? output from the normalizing unit 13. , the frequency on the logarithmic scale is used as a variable, and the ′'' CC-integration is performed.In other words, this integration section 14 converts the spectrum envelope 10g1P(f)1 into the above equation (
5), the temperature is integrated according to the integral function shown in 5). Here, "'C30Hz" is used as the value of ε.

射影部１５は、対数化部１２から出力される対数スペク
トル包絡ｌｏｇＩＰ（ｆ）ｌと積分部１４から出力され
る積分結果の供給を受け、対数スペクトル包絡１ｏｇＩ
Ｐ（ｆ）Ｉ　ｋ先の第６図に示すように周波数（ｆ）上
から積分関数Ｌ（ｆ）（＝ｔ、、（ｆ））上に射影し、
その射影結果を衣示する。つまり、この射影部１５は、
Ｘ−７直交座標のＸ軸にり、（７）’にとり、ｙ軸に対
数スペクトル包絡ＬｏｇＩＰ（ｆ）ｌｋとって、周波数
Ｃｆ）によってノ々ラメータ表示することにより、入力
音声信号（ＡｌＮ）の分析結果金パターン化するもので
ある。The projection unit 15 receives the logarithmic spectrum envelope logIP(f)l output from the logarithmization unit 12 and the integration result output from the integration unit 14, and calculates the logarithmic spectrum envelope 1ogI
P(f)I k As shown in Figure 6, project from the frequency (f) onto the integral function L(f) (=t, , (f)),
The projection results are shown below. In other words, this projection section 15 is
The input audio signal (AlN) is expressed on the X-axis of the X-7 orthogonal coordinate by taking (7)' and the logarithmic spectrum envelope LogIP(f)lk on the y-axis, and displaying it in a linear meter according to the frequency Cf). The analysis results are converted into gold patterns.

なお、この射影部１５の処理において、ｙ軸上にとられ
る値とし又は、先の弐αＱ、αηの説明からも明らかな
工うに、スペクトル包絡Ｐ（ｆ）や正規化対数スペクト
ル包絡１ｏｇＩＰ（ｆ）ｌであってもよい。ま友、これ
に限らず、スペクトル包絡Ｐ（ｆ）を正規化したものＰ
（ｆ）であってもよい。つまり、この発明において、射
影に供される包絡情報とは、少なくとも上述し念４つの
・！ターンを意味する。In addition, in the processing of the projection unit 15, the value taken on the y-axis is used, or as is clear from the previous explanation of 2αQ and αη, the spectral envelope P(f) and the normalized logarithmic spectral envelope 1ogIP(f )l may be used. Mayu, not limited to this, the normalized spectral envelope P(f) P
(f) may be used. In other words, in this invention, the envelope information provided for projection includes at least the four points mentioned above. means turn.

また、射影部１５の処理においては、Ｘ軸に包絡清報を
とり、ｙ軸にり、Ｃｆ）’ｊｒとってもよいことは勿論
である。In addition, in the processing of the projection unit 15, it goes without saying that the envelope information may be taken on the X axis and Cf)'jr on the y axis.

ここで、本実施例の音声分析による実測例を説明する。Here, an example of actual measurement using voice analysis in this embodiment will be explained.

第７図、第８図はそれぞれ、女性、男性の日本語の音韻
“イ”の対数スペクトル包絡ｔａｇ　１ｐ（ｆ）　Ｉ　
’ｆｒ示す。この対数スペクトル包絡のである。Figures 7 and 8 show the logarithmic spectrum envelope tag 1p(f) I of the female and male Japanese phoneme “i”, respectively.
'frshow. This is the logarithmic spectral envelope.

まず、抽出部１１に入力される音声信号（Ａ□Ｎ）は、
コンデンサマイクロフォンで採取した音声を５０μ度の
周期でサンプリングし、これ’ｔ１２ピットでデジタル
化したものである。なお、この音声の採取には、８にワ
ードのウニイブメモリを使っ℃いる。First, the audio signal (A□N) input to the extraction unit 11 is
The audio sampled with a condenser microphone was sampled at a cycle of 50μ degrees and digitized using 't12 pit. In addition, to collect this audio, I used Word's unique memory at 8°C.

抽出部１１は、この入力音声信号（Ａ、Ｎ）からケプス
トラム分析によって、スペクトル包絡ＰＣｆ）を求め℃
いる。ケグストラム分析は、母音安定部の１０２４／イ
ンド長のフレームを微分し、これにハミング窓をかけ友
ものを、ＦＦＴアルゴリズムによってフーリエ変換する
ことにより、スペクトル包絡Ｐ＜ｆ）を得るように設定
され曵いる。The extraction unit 11 calculates the spectral envelope PCf) from this input audio signal (A, N) by cepstral analysis.
There is. The kegstral analysis is set to obtain the spectral envelope P<f) by differentiating the frame of 1024/India length of the vowel stable part, applying a Hamming window to this, and Fourier transforming the resultant using the FFT algorithm. There is.

対数化部１２は、上記のようにして得られたスぜクトラ
ム包絡ＰＣｆ）の絶対値の対数をとり、これに逆７−リ
エ変換を施してケプストラムを得次後、ケフレンシー上
でカットオフ　１．７〜２．５　ｍ　ＷＭｔ：の方形窓
をかけ、これ全フーリエ変換することにより、対数スペ
クトル形状ｌｏｇ　１ｐ（ｆ）ｌを得℃いる。The logarithmization unit 12 takes the logarithm of the absolute value of the spectrum envelope PCf) obtained as described above, performs an inverse 7-lier transform on it to obtain a cepstrum, and then performs a cutoff on the quefrency by 1 By applying a rectangular window of .7 to 2.5 m WMt and performing full Fourier transformation, the logarithmic spectral shape log 1p(f)l is obtained.

なお、この対数スペクトル包絡ｌｏｇ　Ｉ　Ｐ（ｆ）　
１を得るに当っては、上記ケフレンシー上のカットオフ
はピッチ周波数に対応して選ばれている。また、対数ス
ペクトル包絡ｌｏｇ　ＩＰ（ｆ）ｌの大きさの正規化の
念めに、このスペクトル包絡ｌｏｇ　ＩＰ（ｆ）　Ｉｔ
！、ケグストラムの第Ｏ欠成分を一定値に直してから求
められ念ものである。Note that this logarithmic spectrum envelope log I P (f)
1, the cutoff on the quefrency is selected in accordance with the pitch frequency. Also, in order to normalize the magnitude of the logarithmic spectral envelope log IP(f) l, this spectral envelope log IP(f) It
! , it is calculated after fixing the O-th missing component of the kegstrum to a constant value.

第７図、第８図に示される対数スペクトル包絡ｌｏｇ　
ＩＰ（ｆ）Ｉは上記のようにし℃得られるものである。Logarithmic spectral envelope log shown in Figures 7 and 8
IP(f)I is obtained as described above.

ここで、第７図と第８図の対数スペクトル包絡を比較す
ると、両者のスペクトル形状は約５　ｋＨｚ以下の周波
数範囲では類似しているが、これ以上の周波数範囲では
、女性のスペクトル形状の方が高い方へずれている。Comparing the logarithmic spectral envelopes in Figures 7 and 8, the spectral shapes of the two are similar in the frequency range below about 5 kHz, but in the frequency range above this, the female's spectral shape is more similar. is shifted higher.

この対数スペクトル包絡ｌｏｇ　ＩＰ（ｆ）ｌに対して
、ｇ　ｔ−５０Ｈｚとして式（５）で表現されるり、（
ｆ）を求め、これをＸ軸にとり、対数スペクトル包絡ｌ
ｏｇ　ＩＰ（ｆ）ｌ　ｋｙ軸にとったのが、それぞれ第
９図、第１０図である。これをみると、ピークの高さ、
細かい二、アンスは多少異なっているが、第７図と第８
５Ａにおける周波数方向のずれが解消されていることが
わかる。For this logarithmic spectral envelope log IP(f)l, it is expressed as g t-50Hz by equation (5), or (
f), set it on the X axis, and calculate the logarithmic spectrum envelope l
og IP(f)l ky axis is shown in FIGS. 9 and 10, respectively. Looking at this, the peak height,
Although the details are slightly different, Figures 7 and 8
It can be seen that the deviation in the frequency direction at 5A has been eliminated.

第１１図は女性の発声した音韻“夕”の立ち上がり部分
に対し℃、フレーム位置を時間的にすらし℃得られる対
数スペクトル包絡ｌｏｇ　ＩＰ（ｆ）Ｉの時系列変化を
表わしｔものである。第１２図は第１１図の対数スペク
トル包絡ｌｏｇ　ＩＰ（ｆ）ｌに対し℃、本実施例の変
換（Ｕ）を施すことにエリ、音声分析結果の時系列変化
を求め友ものである。FIG. 11 shows the time-series changes in the logarithmic spectrum envelope log IP(f)I obtained by changing the frame position in degrees Celsius and temporally changing the frame position for the rising part of the phoneme "Yu" uttered by a woman. FIG. 12 shows how to apply the transformation (U) of this embodiment to the logarithmic spectrum envelope log IP(f)l of FIG.

この測定結果によれば、本実施例の変換（Ｕ）は、子音
部および母音の立上り部に対し工も胃定に作用し、スペ
クトル変化がなめらかに表示されることがわかる。According to the measurement results, it can be seen that the conversion (U) of the present example has a positive effect on the consonant part and the rising part of the vowel, and the spectrum changes are displayed smoothly.

第１３図、第１４図はそれぞれ、第１２図と同じように
し１得た男性１女性の日本語の音韻“ア”の時系列変化
を示すものである。これら第１３図及び第１４図と先の
第１２図の母音部をみると、男女の大きな声道長差が解
消されていることがわかる。Figures 13 and 14 respectively show the time-series changes in the Japanese phoneme "a" for a man and a woman obtained in the same manner as in Figure 12. Looking at the vowel parts in Figures 13 and 14 and Figure 12 above, it can be seen that the large difference in vocal tract length between men and women has been eliminated.

以上詳述したように、この実施例は、包絡情報（ｐ（ｆ
）、ｐ（ｆ）−ｌｏｇ　ＩＰ（ｆ）ｌあるいはｌｏｇ　
ＩＰ（ｆ）　Ｉ　）を、式（５）と（９）で定義される
変換（Ｕ）により℃、周波数Ｃｆ）上からり、（ｆ）上
に射影することにより、音声分析結果を得るように構成
したものである。As described in detail above, this embodiment uses envelope information (p(f
), p(f)-log IP(f)l or log
The voice analysis result is obtained by projecting IP (f) I) onto (f) using the transformation (U) defined by equations (5) and (9). It is composed of

このような構成によれば、発声者の声道長ＣＬ）の違い
による分析結果の違いを解消することができ、常に、音
韻固有の分析結果を得ることができる。この場合、この
実施例は、有声、無声の違いや母音、子音の違いに関係
なく、入力音声信号（ＡＩＮ）の任意の部分のスペクト
ル包絡に対し℃適用できる。また、分析結果がフォルマ
ント周波数（Ｆ）の抽出精度やその安定性に依存しない
ため、入力音声信号（Ａ、Ｎ）の全部分に適用可能であ
る。According to such a configuration, it is possible to eliminate differences in analysis results due to differences in the vocal tract lengths CL) of speakers, and it is possible to always obtain phoneme-specific analysis results. In this case, this embodiment can be applied to the spectral envelope of any part of the input audio signal (AIN), regardless of whether it is voiced or unvoiced or whether it is a vowel or a consonant. Furthermore, since the analysis result does not depend on the extraction precision of the formant frequency (F) or its stability, it can be applied to all parts of the input audio signal (A, N).

そし℃、特に、この実施例によれば、従来全く不可能で
あっ念、個人の声道長（ｔ）の違いに影響されることな
く、子音部から母音部へ移行し℃いく過程でのスペクト
ル包絡の変化を求めるという課題を解消することができ
る。In particular, according to this embodiment, the process of transitioning from the consonant to the vowel can be realized without being affected by differences in individual vocal tract lengths (t), which was previously impossible. The problem of determining changes in spectral envelope can be solved.

また、この実施例においては、式（５）の被積分関数と
して、スペクトル包絡Ｐ（ｆ）や対数スペクトル包絡ｌ
ｏｇ　ＩＰ（ｆ）　Ｉではなく、正規化対数ス〈−音韻
に対する声の大きさの影響を取り除くことができる。In addition, in this embodiment, the spectral envelope P(f) and the logarithmic spectral envelope l are used as the integrand of equation (5).
og IP(f) Instead of I, the normalized logarithm <- can remove the influence of loudness on phonology.

なお、この発明は先の実施例に限定されるものではない
。Note that the present invention is not limited to the above embodiments.

例えば、先の実施例では、式（５）の被積分関数を得る
のに、抽出部１１から出力されるスくクトル包絡ＰＣｆ
）’ｆｒ対数化しｔ後、正規化する場合を説明したが、
この逆の処理を行ってもよいことは勿論である。For example, in the previous embodiment, in order to obtain the integrand of equation (5), the vector envelope PCf output from the extraction unit 11 is
)'fr We have explained the case of logarithmization and normalization after t, but
Of course, the reverse process may also be performed.

また、この発明の音声分析装置は、ハードフェア、ソフ
トウェアのいずれで、実現し℃もよいことは勿論である
。Furthermore, it goes without saying that the speech analysis device of the present invention can be realized by either hardware or software.

次に、本発明による音声分析装置の他の実施例を第１５
図から第１８図全参照して詳細に説明する。Next, another embodiment of the speech analysis device according to the present invention will be described in the fifteenth embodiment.
A detailed explanation will be given with reference to all of FIGS. 18 to 18.

最初に第１５図全参照して、実施例の構成を説明する。First, the configuration of the embodiment will be explained with full reference to FIG. 15.

その実施例は、前述の実施例と同様に、スペクトル包絡
抽出部１１’と、対数化部１２′と、正規化部１３′と
、積分部１４′と、および、合成部１５′を有する。こ
こで、合成部１５′は、先の実施例における射影部１５
と同様の動作全行う。本実施例における音声分析装置は
、さらに、制御装置１６′と表示装置１７′金有する。This embodiment, like the previous embodiment, includes a spectral envelope extraction section 11', a logarithmization section 12', a normalization section 13', an integration section 14', and a synthesis section 15'. Here, the combining section 15' is the projection section 15 in the previous embodiment.
Perform all the same operations as . The speech analysis device in this embodiment further includes a control device 16' and a display device 17'.

対数化部１２′と正規化部１３′によっ又変換９１０’
ｆ形成する。Transformation 910' is performed by the logarithmization unit 12' and the normalization unit 13'.
f form.

次に、第１６図と第１７図全参照し℃、各部の構成の詳
″ｌｓヲ説明する。Next, the construction of each part will be explained in detail with reference to FIGS. 16 and 17.

ス（クトラム包絡抽出部１１′は、入力される音声信号
のレベルを調整するための調整器２１１と、入力され比
音声信号の高域を強調するための高域強調回路２１２と
、回路２１２から出力される信号全後段のために補強す
るためのバッファ２１３と、バッファ２１３から出力さ
れる信号を適当な周波数範囲に分割し、それぞれ補強し
て出力するための分配回路２１４と、分配回路２１４か
ら出に従って、予め決められた周波数範囲毎に抽出して
、スペクトル包絡を作成するためのフィルタパンク２１
５と、および、制御装置１６′からの出力選択の几めの
制御データに従って、フィルタパンク２１５から出力さ
れるスペクトル包絡のデータをシリアルに出力するため
のマルチブレフサ２１６とから構成される。The spectral envelope extraction unit 11' includes an adjuster 211 for adjusting the level of the input audio signal, a high frequency emphasizing circuit 212 for emphasizing the high frequency range of the input audio signal, and a circuit 212. A buffer 213 for reinforcing all output signals for subsequent stages; a distribution circuit 214 for dividing the signal output from the buffer 213 into appropriate frequency ranges, and reinforcing and outputting each; filter puncture 21 for extracting each predetermined frequency range according to the output and creating a spectrum envelope;
5, and a multi-blefter 216 for serially outputting the spectral envelope data output from the filter puncture 215 in accordance with detailed control data for output selection from the control device 16'.

本実施例では、分配器２１４は４つの分配器２１４−１
から２１４−４からなる。フィルタパンク２１５は、そ
れぞれ８つのフィルタ部２１５−１から２１５−ＩＩか
らなり、各フィルタ部は、第１８図に示されるように、
バンドパスフィルタ３１５と整流部３１６と、ローパス
フィルタ３１７とから構成される。ここで、バンドパス
フィルタ３１５とロー／４スフイルタ３１７は、それぞ
れ、株式会社エヌエフ回路設計ブロック製のＤＴ　−２
１２ＤとＤＴ　−６ＦＬ１が使用される。In this embodiment, the distributor 214 includes four distributors 214-1
It consists of 214-4. Each of the filter punctures 215 consists of eight filter sections 215-1 to 215-II, and each filter section, as shown in FIG.
It is composed of a band pass filter 315, a rectifier 316, and a low pass filter 317. Here, the bandpass filter 315 and the low/4th filter 317 are each DT-2 manufactured by NF Circuit Design Block Co., Ltd.
12D and DT-6FL1 are used.

バンドパスフィルタ３１５は制御装置１６’からの制御
データに従って制御され、特定の周波数帯蛾の信号を通
過させる。整流部３１６は、バンドパスフィルタ３１５
から出力される信号を全波整流し、ロー・イスフィルタ
３１７に供給する。ロー／４’　、Ｘ　フィルタ３１７
は、整流部３１６からの信号から低周波数成分を取出し
、すなわち、うねりの成分を取出す。これにより、パン
トノ４スフイルタ３１５によって制限され比帯域におけ
るスペクトル包絡が得られる。The bandpass filter 315 is controlled according to control data from the control device 16' and passes signals of a specific frequency band. The rectifier 316 includes a bandpass filter 315
The signal output from the filter is full-wave rectified and supplied to a low-chair filter 317. Low/4', X filter 317
extracts the low frequency component from the signal from the rectifier 316, that is, extracts the waviness component. Thereby, a spectral envelope in the fractional band limited by the pantone filter 315 is obtained.

対数化部１２′は、スペクトラム包絡抽出部１１から出
力されるシリアルデータに変換されたス４クトラム包絡
全順次入力して、各チャンネルのス（クト２ム包絡デー
タをサンプリングし℃ホールドする友めのサンブリング
／ホールディング回路Ｃ８／Ｈ）２２２と、Ｓ／Ｈ２２
２にホールドされた信号を入力し、対数化するための対
数化回路（ＬＯＧ）２２３とから構成される。The logarithmization unit 12' sequentially inputs all the spectrum envelopes converted into serial data output from the spectrum envelope extraction unit 11, samples the spectrum envelope data of each channel, and holds the spectrum data in degrees Celsius. sampling/holding circuit C8/H) 222 and S/H22
2, and a logarithmization circuit (LOG) 223 for inputting and logarithmizing a signal held at 2.

対数化部１！には正規化部１３′が続き、対数化部１！
から出力される対数ス（クトラム包絡は正規化回路２３
）によって、正規化される。正規化回路２３ノからの出
力は、積分部１４′の積分器２４１と合成部１５′に供
給される。Logarithmization part 1! is followed by a normalization section 13', and a logarithmization section 1!
The logarithm (cutram envelope) output from the normalization circuit 23
) is normalized. The output from the normalization circuit 23 is supplied to the integrator 241 of the integrating section 14' and the combining section 15'.

合成部１５′は、積分部１４′によって積分されたデー
タをサンプリングし、ホールドするためのサンプリング
／ホールディング回路（Ｓ／Ｈ）２５１と、Ｓ／Ｈ２５
１によってホールドされ工いるデータをデジタルデータ
に変換するためのＡ／Ｄコンバータ２５２と、ノ臂ルス
発生器２５７からのクロックパに：Ｘ、に従って、Ａ／
Ｄコンバータ２５２によって出力されたデータを一時格
納するためのバッファメモｌ）　２５３と、正規化部１
３から出力されるス４クトラム包絡をサンプリングし、
ホールドするためのサンプリング／ホールディング回路
（Ｓ／Ｈ）２５４と、Ｓ／Ｈ２５１によってホールドさ
れているデータ全デジタルデータに変換するためのＡ／
Ｄコンバータ２５５と、ノぐルス発生器２５７からのク
ロックパルスに従って、Ａ／Ｄコンバータ２５２によっ
て出力されたデータを一時格納するためのバッファメモ
リ２５６と、バッファメモリ２５６に格納されているデ
ータを、バッファメモリ２５３に格納され℃いるデータ
金アドレスとして格納す以上の構成により、入力音声信
号の検出はできるが、検出結果全表示する念めに、本実
施例では、さらに、表示部１７′が合成部１５′に続い
ている。The synthesizing section 15' includes a sampling/holding circuit (S/H) 251 for sampling and holding the data integrated by the integrating section 14';
The A/D converter 252 for converting the data held and processed by 1 into digital data and the clock pulse from the pulse generator 257:
A buffer memory l) 253 for temporarily storing data output by the D converter 252, and a normalization unit 1
Sample the spectrum envelope output from 3,
A sampling/holding circuit (S/H) 254 for holding, and an A/H circuit for converting the data held by the S/H 251 into all digital data.
A buffer memory 256 for temporarily storing data output by the A/D converter 252 and a buffer memory 256 for temporarily storing data output from the A/D converter 252 according to clock pulses from the D converter 255 and the nogle generator 257 Although the input audio signal can be detected with the above configuration, in which the data stored in the memory 253 is stored as an address, in order to display all the detection results, in this embodiment, the display section 17' is further provided with a synthesis section. 15'.

表示部１７′は、そこに格納されているデータ全−ｔノシステムメモリｍから読み出す九めの戎示コントローラ
２５９と、システムメモリ肚春４から読み出され之デー
タを表示用に格納するためのフレームメモリ２５８と、
および、コントローラ２５９からの指示に従ってフレー
ムメモリ２５８のデータｆｔ表示するためのＣＲＴ　２
６０とから構成される。The display section 17' has a ninth display controller 259 for reading all the data stored therein from the system memory m, and a display controller 259 for reading out all the data stored therein from the system memory 4 and storing the data for display. frame memory 258;
and a CRT 2 for displaying data ft in the frame memory 258 according to instructions from the controller 259.
60.

ここで、フレームメモリ２５８は、８枚のフレー　乙　
／一ムメモリからなる。システムメモリトチ４から読み出
され友データは、８段階に選別され、その選別に従って
各フレームメモリに格納される。これにより、データが
どれかのフレームメモリの一点に対応することになり、
得られたスペクトル包絡の階調表示が可能となる。Here, the frame memory 258 has eight frames.
/ Consists of one memory. The friend data read from the system memory 4 is sorted into eight stages, and stored in each frame memory according to the sorting. This causes the data to correspond to one point in some frame memory,
It becomes possible to display the obtained spectral envelope in gradations.

次に上記の構成の音声信号分析装置の動作に付いて説明
する。Next, the operation of the audio signal analysis device having the above configuration will be explained.

音声信号が入力される前に、制御装置１６′からＢＣＤ
コードの制御データがパントノやスフィルタ３１５とロ
ーパスフィルタ３１７に供給される。Before the audio signal is input, the BCD is sent from the control device 16'.
Control data for the code is supplied to a pantone filter 315 and a low pass filter 317.

本実施例では、バンドパスフィルタＦｉ６４個有り、ス
ペクトル包絡の周波数帯域のうち６４箇所が、対数周波
数軸に合致するときは、最終的に得られるスペクトル包
絡は周波数軸上での比較に使用されることができる。ま
念、制御データがメル尺度軸に合致するときは、人間の
聴覚に対応するノル尺度上で比較されることができる。In this embodiment, there are 64 band-pass filters Fi, and when 64 of the frequency bands of the spectrum envelope match the logarithmic frequency axis, the finally obtained spectrum envelope is used for comparison on the frequency axis. be able to. Remember, when the control data matches the Mel scale axis, it can be compared on the Norr scale, which corresponds to human hearing.

７６′ 制御装置１；７′からの制御データによってバンドパス
フィルタ３１５とローノやスフィルタ３１７が設定され
た後、音声信号が調整器２１）に入力される。入力され
た音声信号は、適当な直流レベルを有するように調整さ
れる。調整された信号は高域強調回路２１２に入力され
る。音声信号は、高域になるに従い減衰するので、低域
と大体同じレベルになるように、回路２１２は高域周波
数成分を強調して、バッファ２１３に出力する。76' Control device 1; After the bandpass filter 315 and the low-pass filter 317 are set according to the control data from 7', the audio signal is input to the regulator 21). The input audio signal is adjusted to have an appropriate DC level. The adjusted signal is input to the high frequency emphasis circuit 212. Since the audio signal attenuates as it reaches higher frequencies, the circuit 212 emphasizes the higher frequency components so that they are at approximately the same level as the lower frequencies, and outputs them to the buffer 213.

本実施例では、分析されるべきス（クトラム包絡は６４
チヤンネルの周波数範囲に分割されて検出されるので、
音声信号は多数のフィルタを通過しなければならず、バ
ッファ２１３によって補強される。さらに、バッファ２
１３がら出力され友信号は、分配器２１４に供給され、
補強され、帯域別に４つに分けられる。分配器２１４−
１から出力される信号は、フィルタパンクＦＢＯＡ、？
　１５−１とＦＢＯＢ　２１５−２に供給される。他も
同様である。In this example, the spectrum envelope to be analyzed is 64
Since it is divided into the frequency range of the channel and detected,
The audio signal must pass through a number of filters and is augmented by a buffer 213. Furthermore, buffer 2
The friend signal output from 13 is supplied to a distributor 214,
It is reinforced and divided into four bands. Distributor 214-
The signal output from 1 is the filter puncture FBOA, ?
15-1 and FBOB 215-2. The same applies to others.

各フィルタパンク２１５には、８チヤンネルが含まれ、
各チャンネルは、第３２図に示されるように、バンドパ
スフィルタ３１５と、整流器３１６と、ローノやスフィ
ルタ３１７とからなる。フィルタ３１５−１には、外部
からＢＣＤ制笥１データが供給されていて、それに従っ
℃、入力信号の所定の周波数成分全整流回路３１６−１
に出力する。整流回路３１６−１によって全波整流され
た信号は、ローパスフィルタ３１７−１に供給される。Each filter puncture 215 includes eight channels;
Each channel consists of a bandpass filter 315, a rectifier 316, and a low frequency filter 317, as shown in FIG. The filter 315-1 is supplied with BCD restriction 1 data from the outside, and according to the data, the total rectification circuit 316-1 converts the predetermined frequency components of the input signal at °C.
Output to. The signal full-wave rectified by the rectifier circuit 316-1 is supplied to a low-pass filter 317-1.

ローパスフィルタ３１７−１も、外部からのＢＣＤデー
タによって帯域が制御され、整流器３１６−１によって
出力され比信号から所定の周波数成分を抽出する。これ
により、バンドパスフィルタ３１５によつて制限される
帯域におけるスペクトル包絡が得られることになる。The low-pass filter 317-1 also has its band controlled by external BCD data, and extracts a predetermined frequency component from the ratio signal output by the rectifier 316-1. As a result, a spectral envelope in the band limited by the bandpass filter 315 is obtained.

ローパスフィルタ３１７−１から３１７−１６までの出
力はマルチプレクサ２２１−１に供給される。同様に、
フィルタ３１７−１７から３１７−３２までの出力はマ
ルチプレクサ２２１−２に、フィルタ３１７−１７から
３１７−３２までの出力はマルチプレクサ２２１−２に
、および、フィルタ３１７−１７から３１７−３２まで
の出力はマルチプレクサ２２１−２に供給される。The outputs of low-pass filters 317-1 to 317-16 are supplied to multiplexer 221-1. Similarly,
The outputs from filters 317-17 to 317-32 are sent to multiplexer 221-2, the outputs from filters 317-17 to 317-32 are sent to multiplexer 221-2, and the outputs from filters 317-17 to 317-32 are sent to multiplexer 221-2. It is supplied to multiplexer 221-2.

本実施例では、６４チヤンネルの帯域が選択されている
ので、６４個のスペクトル包絡が得られることになり、
マルチプレクサ２２１−１から７６′ ２２１−４は、制御装置片から供給されるタイミング制
御信号に基づいて、順番に各チャンネルのデータを出力
する。In this example, 64 channel bands are selected, so 64 spectral envelopes are obtained.
Multiplexers 221-1 to 76' 221-4 sequentially output data for each channel based on timing control signals provided from the controller pieces.

マルチプレクサ２２１から出力された信号は、対数化さ
れるが、このとき、マルチプレクサ２２１−１から出力
される信号は信号の初めの部分にノイズを含む几め、そ
れらが安定したあと、Ｓ／Ｈ２２２でサンプリングされ
、ホールドされるようにタイミング制御信号は出力され
る。Ｓ／Ｈ２２２によつてホールドされた信号は、ＬＯ
Ｇ　２２３によって対数化される。The signal output from the multiplexer 221 is logarithmized, but at this time, the signal output from the multiplexer 221-1 contains noise at the beginning of the signal, and after it stabilizes, it is logarithmized by the S/H 222. A timing control signal is output to be sampled and held. The signal held by S/H222 is LO
Logarithmized by G 223.

対数化され九スぜクトル包絡信号は、積分器２４１と合
成部１５′に出力される。積分器２４ノは、入力される
スペクトル包絡ｆ：積分し、合成部１５′のＳ／Ｈ２５
１に出力する。The logarithmized nine-sector envelope signal is output to an integrator 241 and a combining section 15'. The integrator 24 integrates the input spectrum envelope f:
Output to 1.

合成部１５′では、Ｓ／Ｈ２５１と２５４は、マルチプ
レクサ２２ノからの出力のタイミングに同期して、入力
されるスペクトル包絡をサンプリングしてホールドし、
それぞれＡ　／　Ｄフン／ぐ一夕２５２と２５５に出力
する。Ａ／Ｄコンノ々−夕２５２と２５５は、入力され
るアナログ信号をデジタルデータに変換し、バッファメ
モリ２５３と２５６に出力する。In the synthesis section 15', the S/Hs 251 and 254 sample and hold the input spectrum envelope in synchronization with the timing of the output from the multiplexer 22.
Output to A/D Hun/Guichiyo 252 and 255, respectively. A/D converters 252 and 255 convert the input analog signals into digital data and output the digital data to buffer memories 253 and 256.

メモリ２５３と２５６には、パルス発生器２５７からク
ロックが供給されている。発生器２５７に７６′ は、制御装置Ｎ′からの、マルチプレクサ２１６に供給
されるタイミング信号と同期した信号が供給さｈ−Ｃい
て、発生器２５７は、それに基づいて、クロックをアド
レスデータとしてメモリ２５３と２５６に供給する。従
って、メモリ２５３には積分性が一時格納され、メモリ
２５６の、メモリ２２５３と同じアドレスにはスくクト
ル包絡の対数化された値が一時格納される。Memories 253 and 256 are supplied with a clock from a pulse generator 257. The generator 257 76' is supplied with a signal h-C synchronized with the timing signal supplied to the multiplexer 216 from the control device N', and based on this, the generator 257 stores the clock as address data in the memory. 253 and 256. Therefore, the integral property is temporarily stored in the memory 253, and the logarithmized value of the vector envelope is temporarily stored in the memory 256 at the same address as the memory 2253.

バッファメモリ２５６に一時格納されたデータハ、コン
トローラ２５９に二っ℃システムメモリノ　〆　／ト↓ミに格納される。このとき、バッファメモリ２５３
に格納されているデータをアト９レスとして、バッファ
メモリ２５６に格納されているデータかに格納されてい
るデータは、積分されたデータな−〆　／ので、メモリ属；のアドレスに一定番地おきにデータが
格納されるとは限らない。抽出されるスにおけろアドレ
スは先に飛んでしまうことになる。The data temporarily stored in the buffer memory 256 is stored in the controller 259 in the system memory. At this time, the buffer memory 253
The data stored in the buffer memory 256 is the integrated data. Data is not necessarily stored. In the extraction process, the address will be skipped first.

／乙／アドレスが離れ念ときには、制御装置β′から補間デー
タが供給されるようにすることも可能であコントローラ
２５９によって読み出され、フレームメモリ２５８に供
給される。このとき、読み出されたデータは、８つのレ
ベルのどのレベルに対応するかが判断され、その結果に
基づい１フレームメモリ２５８に供給される。フレーム
メモリ２５８に格納されたデータは、ＣＲＴ　２６　ｏ
に賢示されることになる。このとき、データは、８つの
レベルに分けられているので１階ＡＱ　ａ示され、例え
ば、スペクトル包絡の時間発展の山に対応する部分は白
く、谷に対応する部分は黒く表示される１、以上、詳細
に述べたように、本実施例に、Ｉｌ：れば、７・ｔ′ 制御装置ざ′からの制御データ全適当に選んで、フィル
タバンクに供給することにエリ、任底の酊標軸上で、積
分をすることができる。/B/ When the address is different, it is also possible to supply interpolated data from the control device β', read it out by the controller 259, and supply it to the frame memory 258. At this time, it is determined which of the eight levels the read data corresponds to, and the data is supplied to one frame memory 258 based on the result. The data stored in the frame memory 258 is transferred to the CRT 26 o
will be wisely shown. At this time, the data is divided into eight levels, so it is shown as 1st floor AQ a. For example, the parts corresponding to the peaks of the time evolution of the spectrum envelope are displayed in white, and the parts corresponding to the valleys are displayed in black. As described above in detail, in this embodiment, if all the control data from the control device 7 and t' are appropriately selected and supplied to the filter bank. Integration can be done on the standard axis.

［発明の効果コこの発明によれば、声道長に影響されることなく、音韻
固有の分析パターンを得ることができるので、不特定話
者の音声認識など、発声者によらず同一音韻に対しては
同一分析結果を得几い場合に、多大に寄与することがで
きる。[Effects of the invention] According to this invention, it is possible to obtain a phoneme-specific analysis pattern without being affected by the length of the vocal tract. However, if the same analytical results are obtained, it can make a significant contribution.

[Brief explanation of drawings]

第１図は一実施例の構成を示すブロック図、第２図は一
実施例の概略全説明するためのスペクトル包絡図、第３
図、第４図は一実施例の概略全説明するための測定図、
第５図、第６図は一実施例の概略を説明する定めのスペ
クトル包絡図、第７図乃至第１４図は一実施例の実測例
を説明する友めのスペクトル包絡図である。第１５図は
他の実施例の構成金示すブロック図、第１６図および第
１７図は第１５図で示される各ブロックの詳細全説明す
るブロック図、第１８図は、第１６図に示されるフィル
ターパンクの詳細を示す。１０・・・変換部、１１・・・スペクトル包絡抽出部。１２・・・対数化部、１３・・・正規化部、１４・・・
積分部。１５・・・射影部、１０′・・・変換部、１１′・・・
スイクトル包絡抽出部、１２′・・・対数化部、１３′
・・・正規化部、１４′・・・積分部、１５′・・・射
影部、１６′・・・制御装置、１７′・−表示部。出願人代理人　弁理士　鈴　江　武　彦第１図第２図Ｌｐ（＋）第９図ｚｏｇｌｐ（ｔ）ｌ　　　　　　　　　　　　男：Ｖ第
１０図第１ｚ　　図第１８図Fig. 1 is a block diagram showing the configuration of one embodiment, Fig. 2 is a spectrum envelope diagram for explaining the entire outline of one embodiment, and Fig. 3 is a block diagram showing the configuration of one embodiment.
FIG. 4 is a measurement diagram for completely explaining the outline of one embodiment,
5 and 6 are regular spectral envelope diagrams for explaining the outline of one embodiment, and FIGS. 7 to 14 are companion spectral envelope diagrams for explaining actual measurement examples of one embodiment. FIG. 15 is a block diagram showing the configuration of another embodiment, FIGS. 16 and 17 are block diagrams fully explaining the details of each block shown in FIG. 15, and FIG. 18 is a block diagram showing the details of each block shown in FIG. 16. Show details of filter puncture. 10... Conversion section, 11... Spectrum envelope extraction section. 12... Logarithmization section, 13... Normalization section, 14...
Integral part. 15... Projection section, 10'... Conversion section, 11'...
Suictor envelope extraction section, 12'... Logarithmization section, 13'
... Normalization section, 14'... Integration section, 15'... Projection section, 16'... Control device, 17'... Display section. Applicant's representative Patent attorney Takehiko Suzue Figure 1 Figure 2 Lp(+) Figure 9 zoglp(t)l Male: V Figure 10 Figure 1z Figure 18

Claims

[Claims]

(1) A conversion means for converting the input spectral envelope so that its value becomes appropriate, and inputting the spectral envelope converted by the conversion means, and integrating the input spectral envelope with respect to a predetermined variable. and a projection means for inputting the spectral envelope transformed by the converting means and the spectral envelope integrated by the integrating means and projecting the spectral envelope regarding the integrated spectrum. A voice analysis device featuring:

(2) The conversion means includes a logarithmization means for logarithmizing and outputting input spectral envelope data, and a normalization means for normalizing the spectral envelope logarithmized by the logarithmization means. A speech analysis device according to claim 1, characterized by comprising:

(3) The conversion means includes normalization means for normalizing the input spectral envelope, and logarithmization means for logarithmizing the spectral envelope normalized by the normalization means. A speech analysis device according to claim 1, characterized in that:

(4) The speech analysis device according to claim 1, wherein the conversion means includes a logarithmization means for logarithmizing the input spectrum envelope.

(5) The speech analysis device according to claim 1, wherein the conversion means includes normalization means for normalizing the input spectrum envelope.

(6) The speech analysis device according to claim 1, wherein the integration by the integrating means is performed with respect to frequency as a predetermined variable.

(7) Spectral envelope extraction means for extracting a spectral envelope from an input audio signal according to an input first control signal, and outputting data representing the extracted spectral envelope according to an input second control signal. and converting means for converting the spectral envelope extracted by the spectral envelope extracting means so that its value becomes appropriate; and inputting the spectral envelope converted by the converting means, an integrating means for integrating an input spectrum envelope; and inputting the spectrum envelope converted by the converting means and the spectrum envelope integrated by the integrating means according to an input third control signal, and calculating a spectrum related to the integrated spectrum. outputting the first and second control signals to a projection means for projecting an envelope and the spectral envelope extraction means;
A speech analysis device comprising: control means for outputting the third control signal to the projection means.

(8) The conversion means includes a logarithmization means for logarithmizing and outputting input spectral envelope data, and a normalization means for normalizing the spectral envelope logarithmized by the logarithmization means. A speech analysis device according to claim 7, characterized by comprising:

(9) The conversion means includes normalization means for normalizing the input spectral envelope, and logarithmization means for logarithmizing the spectral envelope normalized by the normalization means. A speech analysis device according to claim 7, characterized in that:

(10) The speech analysis device according to claim 7, wherein the conversion means includes a logarithmization means for logarithmizing the input spectrum envelope.

(11) The speech analysis device according to claim 7, wherein the conversion means includes normalization means for normalizing the input spectrum envelope.

(12) The speech analysis device according to claim 7, wherein the integration by the integrating means is performed with respect to frequency as a predetermined variable.

(13) The speech analysis device according to claim 7, wherein the integration by the integrating means is performed with respect to a Mel scale as a predetermined variable.

(14) The spectral envelope extraction means includes a high-frequency emphasizing means for emphasizing the high-frequency range so that the high-frequency level of the input audio signal approaches the low-frequency level; band-pass filter means for dividing into a predetermined plurality of channels and performing band-limiting according to the first control signal; and full-wave rectification of the audio signal of each channel band-limited by the band-pass filter means. and according to the first control signal,
low-pass filter means for detecting a band-limited spectral envelope from the audio signal front-wave rectified by the rectifying means; 8. The speech analysis device according to claim 7, further comprising multiplexer means for selectively outputting the output according to the control signal.

(15) The projection means includes a first A/D conversion means for converting data from the integration means into digital data, and the data converted by the first A/D conversion means into the first A/D conversion means. a first buffer memory means for temporary storage and a second A/D for converting the data from the converting means into digital data according to the control data of No. 3;
a conversion means, a second buffer memory means for temporarily storing the data converted by the second A/D conversion means in accordance with the third control signal (2), and the second buffer memory. Claim 7, further comprising a storage means for storing the data stored in the means as an address of the data stored in the first buffer memory means. Speech analysis device.