JPS61190400A - Enunciation speed estimate apparatus - Google Patents
Enunciation speed estimate apparatusInfo
- Publication number
- JPS61190400A JPS61190400A JP60030183A JP3018385A JPS61190400A JP S61190400 A JPS61190400 A JP S61190400A JP 60030183 A JP60030183 A JP 60030183A JP 3018385 A JP3018385 A JP 3018385A JP S61190400 A JPS61190400 A JP S61190400A
- Authority
- JP
- Japan
- Prior art keywords
- unit
- speech
- similarity
- autocorrelation
- calculation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Abstract
(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.
Description
【発明の詳細な説明】
〔概 要〕
この発明は音声、特に5つの母音と撥音の発声速度を推
定するもので、特に入力音声の5母音及び撥音の類似度
時系列により各時刻の最大値を求め選択した時系列の自
己相関値の最大値を抽出し、得られた最大点の逆数を発
声速度とすることにより5母音および撥音の単位時間発
声速度とするものである。[Detailed Description of the Invention] [Summary] This invention estimates the pronunciation speed of speech, particularly of five vowels and pellicles, and in particular estimates the maximum value at each time based on the similarity time series of the five vowels and pellicles of input speech. The maximum value of the autocorrelation value of the selected time series is extracted, and the reciprocal of the obtained maximum point is taken as the vocalization rate, thereby obtaining the unit time vocalization rate of the five vowels and the pellicles.
この発明は音声処理装置に属し、特に5母音および撥音
の単位時間発声速度を推定する発声速度推定装置に関す
る。The present invention relates to speech processing devices, and particularly relates to a speech rate estimating device for estimating the unit time speech rate of five vowels and pellicles.
従来の発声速度の推定方法を第3図(a)〜(e)を用
いて説明する。第3図(a)において縦軸は音声エネル
ギ(E)横軸は時間(1)である。この場合例えば「ア
カダ(AKADA)Jと発声すると音声区間(T)にお
いて母音部分のエネルギが大きくなり3つの山が生ずる
。このような音声波形を関数E = g (t)とし、
第3図(b)〜(e)に示すよう、な例えば1〜4サイ
クルの正弦波間数f (cosθ)によるコサイン級数
展開係数を当てはめて音声エネルギ波形と正弦波関数と
の間の相関関数を求める。即ち、音声区間Tの微小区間
について時系列的に関数g(t)と関数r (cosθ
)との積を求めこれを累計する内積を求め展開係数と称
する。この場合、(b)に示すような1サイクル波形と
g(t)との内積を求めると図からも明らかなようにA
部分の波形が大幅にずれるために展開係数の絶対値は小
さくなる。このように(b)および(e)についてもA
部分の波形がずれるために展開係数の絶対値は小さくな
る。一方、(d)の3サイクルの場合ではAの波形と負
のピークとがほぼ一致するために絶対値は大きくなる。A conventional method for estimating speech rate will be explained using FIGS. 3(a) to 3(e). In FIG. 3(a), the vertical axis represents audio energy (E) and the horizontal axis represents time (1). In this case, for example, when uttering "AKADA J", the energy of the vowel part increases in the vocal section (T), resulting in three peaks. Such a speech waveform is defined as a function E = g (t),
As shown in Figures 3(b) to (e), for example, the correlation function between the audio energy waveform and the sine wave function is calculated by applying a cosine series expansion coefficient based on the number f (cos θ) between 1 to 4 cycles of the sine wave. demand. In other words, the function g(t) and the function r (cos θ
), and the inner product is called the expansion coefficient. In this case, when calculating the inner product of the 1-cycle waveform and g(t) as shown in (b), as is clear from the figure, A
The absolute value of the expansion coefficient becomes small because the waveform of that part is significantly shifted. In this way, for (b) and (e), A
The absolute value of the expansion coefficient becomes small because the waveform of the part shifts. On the other hand, in the case of 3 cycles (d), the waveform of A and the negative peak almost match, so the absolute value becomes large.
このようにして4サイクル以後についても内積を求め、
絶対値はg(t)に対して正方向もしくは負の方向に同
期しているときに最大となるので、このサイクル数(周
波数)が音声区間Tにおける1秒当りの母音の数(モー
97秒)を示すことになる。In this way, calculate the inner product for the 4th cycle and beyond,
The absolute value is maximum when synchronized in the positive or negative direction with respect to g(t), so this cycle number (frequency) is the number of vowels per second in the speech interval T (Mo 97 seconds). ).
このように従来は例えば音声区間の微小区間として数1
0m5ec毎のエネルギ値の時系列について、コサイン
級数展開係数の絶対値が最大となるようなコサイン展開
関数の周波数を発声速度としてm:σ
いていた。In this way, conventionally, for example, as a minute section of a voice section, the number 1 is
Regarding the time series of energy values every 0 m5ec, the frequency of the cosine expansion function where the absolute value of the cosine series expansion coefficient becomes the maximum was set as the utterance rate m:σ.
しかしながら、従来はこのように各モーラ(5母音及び
撥音)がエネルギ時系列上で山を形成するという性質を
利用しているが、実際の音声では必ずしも明確な山谷を
形成するとは限らず、例えば「アイ (AI)Jでは2
つの母音により山が続くことになり正弦波関数がうまく
当てはまらず従って周波数が求められずに発声速度の抽
出が正確に行われない場合が生ずる。However, although conventionally, each mora (five vowels and pellicles) forms a peak in the energy time series in this way, it does not necessarily form a clear peak and valley in actual speech. “AI (AI)J is 2
As the peaks continue due to the two vowels, the sine wave function cannot be applied well, so the frequency cannot be determined and the speech rate may not be extracted accurately.
〔問題点を解決するための手段および作用〕本発明は上
述の問題点を解決した発声速度推定装置であって、各モ
ーラ形成音素(5母音、撥音)の類似度時系列の最大値
時系列をモーラ位置時系列とし、その自己相関関数の最
大点をモーラ繰返し時間として抽出し、その逆数を発声
速度とするようにしたものであり、その手段は、音声処
理装置における発声速度推定装置において、音声波を電
気信号に変換する音響電気変換部と、該電気信号をデジ
タル信号に変換する入力部と、該ディジタル信号から特
徴の時系列的抽出を行う特徴抽出部と、5母、音及びI
B音の特徴を予め格納した辞書部と、該特徴抽出部の出
力と該辞書部から読み出された各音の特徴との間で類似
度の時系列的計算を行う類似度計算部と、各時刻におけ
る各音の類似度の最適値の選択を行う最適値選択部と、
該最適値選択部の出力時系列の自己相関を音声区間全体
に亘り計算する自己相関計算部と、該自己相関の最大点
を抽出する最大抽出部と、該最大点の逆数を計算する逆
数計算部とを具備し、該逆数を単位時間当りの発声速度
とすることを特徴とする。[Means and effects for solving the problems] The present invention is a speech rate estimating device that solves the above-mentioned problems, and which calculates the maximum value time series of the similarity time series of each mora-forming phoneme (5 vowels, plosives). is a mora position time series, the maximum point of the autocorrelation function is extracted as the mora repetition time, and the reciprocal thereof is taken as the speaking rate. an acousto-electric conversion unit that converts audio waves into electrical signals; an input unit that converts the electrical signals into digital signals; a feature extraction unit that extracts features in time series from the digital signals;
a dictionary section that stores B sound features in advance; a similarity calculation section that performs time-series calculation of similarity between the output of the feature extraction section and the features of each sound read from the dictionary section; an optimal value selection unit that selects the optimal value of the similarity of each sound at each time;
an autocorrelation calculation unit that calculates the autocorrelation of the output time series of the optimal value selection unit over the entire speech interval; a maximum extraction unit that extracts the maximum point of the autocorrelation; and a reciprocal calculation unit that calculates the reciprocal of the maximum point. , and the reciprocal number is taken as the speech rate per unit time.
第1図は本発明に係る発声速度推定装置の一実施例ブロ
ック線図である。第1図において、音響電気変換部とし
てのマイクロホン1より入力された音声は入力部2でデ
ジタル化され、特徴抽出部3で数IQmsec毎に周波
数分析され約16個程度の帯域内のエネルギが特徴ベク
トルとして抽出される。辞書部4には5母音及び撥音を
上記と同様に特徴抽出した特徴ベクトルが各音素毎に記
憶されている。類似度計算部5では特徴抽出@3で抽出
された特徴ベクトル(x、)と辞書部4内の各音素毎の
特徴ベクトル、
(y、)
との類似度(ρ)計算が例えば下記の(1)式の如く行
われる。ここでρAは「ア(A)」との類似度を示す値
である。FIG. 1 is a block diagram of an embodiment of a speech rate estimating device according to the present invention. In FIG. 1, audio input from a microphone 1 serving as an acoustoelectric transducer is digitized by an input unit 2, and frequency-analyzed by a feature extraction unit 3 every several IQmsec to identify energy characteristics within about 16 bands. Extracted as a vector. The dictionary section 4 stores feature vectors for each phoneme, which are obtained by extracting the features of the five vowels and pellicles in the same manner as described above. The similarity calculation unit 5 calculates the similarity (ρ) between the feature vector (x,) extracted in the feature extraction @3 and the feature vector (y,) for each phoneme in the dictionary unit 4, for example, as shown below ( 1) It is carried out as shown in the formula. Here, ρA is a value indicating the degree of similarity with "A".
以下余白
特徴ベクトル(xi)は分析周期の微小区間として数I
Qmsec度に得られるので類似度ρ1〜ρ9も特徴ベ
クトルが得られる度に計算され、時刻jにおいて、
ρ4.ρj、ρj、ρ4.ρ4.ρj
という各音素の類似度時系列が得られる。Below, the margin feature vector (xi) is a number I as a minute interval of the analysis period.
Since the similarities are obtained in Qmsec degrees, the similarities ρ1 to ρ9 are also calculated every time a feature vector is obtained, and at time j, ρ4. ρj, ρj, ρ4. ρ4. A similarity time series of each phoneme called ρj is obtained.
最適値選択部6では次の計算を行う。The optimum value selection unit 6 performs the following calculation.
ここで、ff1axは引数の最適値として最大値を選択
することを示す。Here, ff1ax indicates that the maximum value is selected as the optimal value of the argument.
自己相関計算部7では前段で計算したρ、の自己相関関
数を次の式により計算する。The autocorrelation calculation unit 7 calculates the autocorrelation function of ρ calculated in the previous stage using the following formula.
最大抽出部8では(νk)k−INのに=0以外の最大
のピーク点をに□8として抽出し、つぎの(4)式より
分析周期T (sec)を掛算し1モーラの平均時間長
りを得る。The maximum extraction unit 8 extracts the maximum peak point of (νk)k-IN other than 0 as □8, and multiplies it by the analysis period T (sec) from the following equation (4) to obtain the average time of 1 mora. gain length.
L = T −k 、、、 −−−−
−−−(4)逆数計算部9では次式の如くLの逆数を計
算し発声速度S(モー97秒)を得る。L = T −k,,, −−−−
--- (4) The reciprocal calculation section 9 calculates the reciprocal of L as shown in the following equation to obtain the speaking speed S (Mo 97 seconds).
S=1/L −・−(5)第2図
(a)〜(h)は上述した処理手順の例を説明する図で
iる。例えば「ギンザの(GINZANO) Jを例に
とり、図において縦軸は音声エネルギ値、横軸(D は
時間(例えばlQmsec間隔)を示す。S=1/L -.-(5) FIGS. 2(a) to 2(h) are diagrams for explaining an example of the above-mentioned processing procedure. For example, taking "GINZANO J" as an example, in the figure, the vertical axis represents the audio energy value, and the horizontal axis (D represents time (for example, 1Qmsec interval).
(a)において時間jにおける類似度ρjAは音声rG
INZANOJの内rAJにおいて類似度が高いことが
わかる。同様にr I J rUJ−−−−一・−r
NJの類似度が高い場所は波形が(b)〜(f)の山の
ようになることがわかる。これらの類似度計算は前述の
(1)式に基づいて類似度計算部5において行われる。In (a), the similarity ρjA at time j is voice rG
It can be seen that the degree of similarity is high in rAJ among INZANOJ. Similarly r I J rUJ---1・-r
It can be seen that where the NJ similarity is high, the waveforms look like mountains (b) to (f). These similarity calculations are performed in the similarity calculation section 5 based on the above-mentioned equation (1).
そしてjを1番目からn1番目まで、つまり10m5e
cごとに順次変えて上述の類似度計算を行いr A J
r I J −−−−−−−・rNJについて最大
値を求める。この計算は(2)式のように示され最適値
選択部6において計算される。この結果は(g)に示す
ような波形となる。このような波形について(3)式に
もとづいて自己相関関数を計算すると(h)に示す波形
が得られる。即ち、ここで縦軸 (ν、)は自己相関関
数値、横軸(k)はずらし量を示し、すらしlkは(g
)に示す波形を時間にだけずらすことを示すものとする
。このようにして順次にだけずらした(g)の波形とに
=0における(g)の波形との自己相関関数を計算する
と(h)に示す波形が得られる。もちろんに=’O1即
ち、ずらさない場合に相関が大となることは当然であり
、k=0を除いて次に最大となる点k maxが得られ
る。逆数計算部9においてこの時のk maxと分析周
期Tとを(4)式に示す如く掛けて1モーラ当りの平均
時間長りを得る。このLの逆数Sを計算することにより
発生速度が得られる。and j from 1st to n1th, that is, 10m5e
Perform the above similarity calculation by changing each c sequentially. r A J
Find the maximum value for r I J ----------rNJ. This calculation is shown as equation (2) and is calculated in the optimum value selection section 6. The result is a waveform as shown in (g). When an autocorrelation function is calculated based on equation (3) for such a waveform, a waveform shown in (h) is obtained. That is, here, the vertical axis (ν,) shows the autocorrelation function value, the horizontal axis (k) shows the shift amount, and the smoothness lk is (g
) indicates that the waveform shown in ) is shifted only in time. By calculating the autocorrelation function between the waveform (g) shifted only in this way and the waveform (g) at =0, the waveform shown in (h) is obtained. Of course, it is natural that the correlation becomes large when ='O1, that is, when there is no shift, and the next largest point k max is obtained except for k=0. In the reciprocal calculation unit 9, k max at this time is multiplied by the analysis period T as shown in equation (4) to obtain the average time length per mora. By calculating the reciprocal S of this L, the generation rate can be obtained.
本発明によれば陽に音素認識を行うことなく音声の発声
速度を正確に推定することができる。According to the present invention, it is possible to accurately estimate the speaking rate of speech without explicitly performing phoneme recognition.
第1図は、本発明に係る発声速度推定装置の一実施例ブ
ロック線図、
第2図(a)〜(h)は、第1図装置の処理手順を説明
する図、および
第3図(a)〜(e)は従来の発声速度の推定方法を説
明する図である。
(符号の説明)
1−マイクロホン、
2−人力部、
3・・−特徴抽出部、
4−・−辞書部、
5−・−類似度計算部、
6−最適値選択部、
7・−自己相関計算部、
8・−最大抽出部、
9−逆数計算部。FIG. 1 is a block diagram of an embodiment of the speech rate estimating device according to the present invention, FIGS. 2(a) to (h) are diagrams explaining the processing procedure of the device in FIG. FIGS. 8A to 8E are diagrams illustrating a conventional method for estimating speech rate. (Explanation of symbols) 1-microphone, 2-human power section, 3...-feature extraction section, 4--dictionary section, 5--similarity calculation section, 6-optimum value selection section, 7.--autocorrelation calculation unit, 8-maximum extraction unit, 9-reciprocal calculation unit.
Claims (1)
音声波を電気信号に変換する音響電気変換部と、該電気
信号をデジタル信号に変換する入力部と、該ディジタル
信号から特徴の時系列的抽出を行う特徴抽出部と、5母
音及び撥音の特徴を予め格納した辞書部と、該特徴抽出
部の出力と該辞書部から読み出された各音の特徴との間
で類似度の時系列的計算を行う類似度計算部と、各時刻
における各音の類似度の最適値の選択を行う最適値選択
部と、該最適値選択部の出力時系列の自己相関を音声区
間全体に亘り計算する自己相関計算部と、該自己相関の
最大点を抽出する最大抽出部と、該最大点の逆数を計算
する逆数計算部とを具備し、該逆数を単位時間当りの発
声速度とすることを特徴とする発声速度推定装置。1. In a speech rate estimation device in a speech processing device,
an acousto-electric conversion unit that converts a speech wave into an electrical signal; an input unit that converts the electrical signal into a digital signal; a feature extraction unit that extracts features in time series from the digital signal; a similarity calculation unit that performs time-series calculation of similarity between the output of the feature extraction unit and the feature of each sound read from the dictionary unit; an optimal value selection section that selects the optimal value of sound similarity; an autocorrelation calculation section that calculates the autocorrelation of the output time series of the optimal value selection section over the entire speech interval; and an autocorrelation calculation section that calculates the maximum point of the autocorrelation. A speech rate estimating device comprising: a maximum extraction unit that extracts; and a reciprocal calculation unit that calculates a reciprocal of the maximum point; the reciprocal is used as a vocalization rate per unit time.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP60030183A JPS61190400A (en) | 1985-02-20 | 1985-02-20 | Enunciation speed estimate apparatus |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP60030183A JPS61190400A (en) | 1985-02-20 | 1985-02-20 | Enunciation speed estimate apparatus |
Publications (2)
Publication Number | Publication Date |
---|---|
JPS61190400A true JPS61190400A (en) | 1986-08-25 |
JPH0588478B2 JPH0588478B2 (en) | 1993-12-22 |
Family
ID=12296643
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
JP60030183A Granted JPS61190400A (en) | 1985-02-20 | 1985-02-20 | Enunciation speed estimate apparatus |
Country Status (1)
Country | Link |
---|---|
JP (1) | JPS61190400A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH05289691A (en) * | 1992-04-10 | 1993-11-05 | Nippon Telegr & Teleph Corp <Ntt> | Speech speed measuring instrument |
JP2002221976A (en) * | 2001-01-24 | 2002-08-09 | Yamaha Corp | Speech speed detecting method and voice signal processor |
-
1985
- 1985-02-20 JP JP60030183A patent/JPS61190400A/en active Granted
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH05289691A (en) * | 1992-04-10 | 1993-11-05 | Nippon Telegr & Teleph Corp <Ntt> | Speech speed measuring instrument |
JP2002221976A (en) * | 2001-01-24 | 2002-08-09 | Yamaha Corp | Speech speed detecting method and voice signal processor |
Also Published As
Publication number | Publication date |
---|---|
JPH0588478B2 (en) | 1993-12-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Talkin et al. | A robust algorithm for pitch tracking (RAPT) | |
JP2763322B2 (en) | Audio processing method | |
US8889976B2 (en) | Musical score position estimating device, musical score position estimating method, and musical score position estimating robot | |
EP1667108B1 (en) | Speech synthesis system, speech synthesis method, and program product | |
JP2009042716A (en) | Cyclic signal processing method, cyclic signal conversion method, cyclic signal processing apparatus, and cyclic signal analysis method | |
CN104123934A (en) | Speech composition recognition method and system | |
JP2005266797A (en) | Method and apparatus for separating sound-source signal and method and device for detecting pitch | |
US11557287B2 (en) | Pronunciation conversion apparatus, pitch mark timing extraction apparatus, methods and programs for the same | |
JP5325130B2 (en) | LPC analysis device, LPC analysis method, speech analysis / synthesis device, speech analysis / synthesis method, and program | |
JPS61190400A (en) | Enunciation speed estimate apparatus | |
JPH0777979A (en) | Speech-operated acoustic modulating device | |
CN103778914A (en) | Anti-noise voice identification method and device based on signal-to-noise ratio weighing template characteristic matching | |
CN109697985B (en) | Voice signal processing method and device and terminal | |
JP3500690B2 (en) | Audio pitch extraction device and audio processing device | |
JPS58108590A (en) | Voice recognition equipment | |
d’Alessandro et al. | Phase-based methods for voice source analysis | |
JP4313740B2 (en) | Reverberation removal method, program, and recording medium | |
JP4882152B2 (en) | Speech speed detection method and audio signal processing apparatus | |
JPS61128300A (en) | Pitch extractor | |
Razak et al. | A preliminary speech analysis for recognizing emotion | |
JPH03216699A (en) | Sound source data generating method of sound synthesizer | |
CN113450768A (en) | Speech synthesis system evaluation method and device, readable storage medium and terminal equipment | |
Toma et al. | Recognition of English vowels in isolated speech using characteristics of Bengali accent | |
JPH04253100A (en) | Sound source data generating method of voice synthesizer | |
Mamat et al. | Mandarin syllables speech trainer based on F1 and F2 formant frequencies |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
LAPS | Cancellation because of no payment of annual fees |