JPS61190400A - Enunciation speed estimate apparatus - Google Patents

Enunciation speed estimate apparatus

Info

Publication number
JPS61190400A
JPS61190400A JP60030183A JP3018385A JPS61190400A JP S61190400 A JPS61190400 A JP S61190400A JP 60030183 A JP60030183 A JP 60030183A JP 3018385 A JP3018385 A JP 3018385A JP S61190400 A JPS61190400 A JP S61190400A
Authority
JP
Japan
Prior art keywords
unit
speech
similarity
autocorrelation
calculation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP60030183A
Other languages
Japanese (ja)
Other versions
JPH0588478B2 (en
Inventor
晋太 木村
小林 敦仁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujitsu Ltd
Original Assignee
Fujitsu Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujitsu Ltd filed Critical Fujitsu Ltd
Priority to JP60030183A priority Critical patent/JPS61190400A/en
Publication of JPS61190400A publication Critical patent/JPS61190400A/en
Publication of JPH0588478B2 publication Critical patent/JPH0588478B2/ja
Granted legal-status Critical Current

Links

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。
(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】 〔概 要〕 この発明は音声、特に5つの母音と撥音の発声速度を推
定するもので、特に入力音声の5母音及び撥音の類似度
時系列により各時刻の最大値を求め選択した時系列の自
己相関値の最大値を抽出し、得られた最大点の逆数を発
声速度とすることにより5母音および撥音の単位時間発
声速度とするものである。
[Detailed Description of the Invention] [Summary] This invention estimates the pronunciation speed of speech, particularly of five vowels and pellicles, and in particular estimates the maximum value at each time based on the similarity time series of the five vowels and pellicles of input speech. The maximum value of the autocorrelation value of the selected time series is extracted, and the reciprocal of the obtained maximum point is taken as the vocalization rate, thereby obtaining the unit time vocalization rate of the five vowels and the pellicles.

〔産業上の利用分野〕[Industrial application field]

この発明は音声処理装置に属し、特に5母音および撥音
の単位時間発声速度を推定する発声速度推定装置に関す
る。
The present invention relates to speech processing devices, and particularly relates to a speech rate estimating device for estimating the unit time speech rate of five vowels and pellicles.

〔従来の技術および発明が解決しようとする問題点〕[Problems to be solved by conventional technology and invention]

従来の発声速度の推定方法を第3図(a)〜(e)を用
いて説明する。第3図(a)において縦軸は音声エネル
ギ(E)横軸は時間(1)である。この場合例えば「ア
カダ(AKADA)Jと発声すると音声区間(T)にお
いて母音部分のエネルギが大きくなり3つの山が生ずる
。このような音声波形を関数E = g (t)とし、
第3図(b)〜(e)に示すよう、な例えば1〜4サイ
クルの正弦波間数f (cosθ)によるコサイン級数
展開係数を当てはめて音声エネルギ波形と正弦波関数と
の間の相関関数を求める。即ち、音声区間Tの微小区間
について時系列的に関数g(t)と関数r (cosθ
)との積を求めこれを累計する内積を求め展開係数と称
する。この場合、(b)に示すような1サイクル波形と
g(t)との内積を求めると図からも明らかなようにA
部分の波形が大幅にずれるために展開係数の絶対値は小
さくなる。このように(b)および(e)についてもA
部分の波形がずれるために展開係数の絶対値は小さくな
る。一方、(d)の3サイクルの場合ではAの波形と負
のピークとがほぼ一致するために絶対値は大きくなる。
A conventional method for estimating speech rate will be explained using FIGS. 3(a) to 3(e). In FIG. 3(a), the vertical axis represents audio energy (E) and the horizontal axis represents time (1). In this case, for example, when uttering "AKADA J", the energy of the vowel part increases in the vocal section (T), resulting in three peaks. Such a speech waveform is defined as a function E = g (t),
As shown in Figures 3(b) to (e), for example, the correlation function between the audio energy waveform and the sine wave function is calculated by applying a cosine series expansion coefficient based on the number f (cos θ) between 1 to 4 cycles of the sine wave. demand. In other words, the function g(t) and the function r (cos θ
), and the inner product is called the expansion coefficient. In this case, when calculating the inner product of the 1-cycle waveform and g(t) as shown in (b), as is clear from the figure, A
The absolute value of the expansion coefficient becomes small because the waveform of that part is significantly shifted. In this way, for (b) and (e), A
The absolute value of the expansion coefficient becomes small because the waveform of the part shifts. On the other hand, in the case of 3 cycles (d), the waveform of A and the negative peak almost match, so the absolute value becomes large.

このようにして4サイクル以後についても内積を求め、
絶対値はg(t)に対して正方向もしくは負の方向に同
期しているときに最大となるので、このサイクル数(周
波数)が音声区間Tにおける1秒当りの母音の数(モー
97秒)を示すことになる。
In this way, calculate the inner product for the 4th cycle and beyond,
The absolute value is maximum when synchronized in the positive or negative direction with respect to g(t), so this cycle number (frequency) is the number of vowels per second in the speech interval T (Mo 97 seconds). ).

このように従来は例えば音声区間の微小区間として数1
0m5ec毎のエネルギ値の時系列について、コサイン
級数展開係数の絶対値が最大となるようなコサイン展開
関数の周波数を発声速度としてm:σ いていた。
In this way, conventionally, for example, as a minute section of a voice section, the number 1 is
Regarding the time series of energy values every 0 m5ec, the frequency of the cosine expansion function where the absolute value of the cosine series expansion coefficient becomes the maximum was set as the utterance rate m:σ.

しかしながら、従来はこのように各モーラ(5母音及び
撥音)がエネルギ時系列上で山を形成するという性質を
利用しているが、実際の音声では必ずしも明確な山谷を
形成するとは限らず、例えば「アイ (AI)Jでは2
つの母音により山が続くことになり正弦波関数がうまく
当てはまらず従って周波数が求められずに発声速度の抽
出が正確に行われない場合が生ずる。
However, although conventionally, each mora (five vowels and pellicles) forms a peak in the energy time series in this way, it does not necessarily form a clear peak and valley in actual speech. “AI (AI)J is 2
As the peaks continue due to the two vowels, the sine wave function cannot be applied well, so the frequency cannot be determined and the speech rate may not be extracted accurately.

〔問題点を解決するための手段および作用〕本発明は上
述の問題点を解決した発声速度推定装置であって、各モ
ーラ形成音素(5母音、撥音)の類似度時系列の最大値
時系列をモーラ位置時系列とし、その自己相関関数の最
大点をモーラ繰返し時間として抽出し、その逆数を発声
速度とするようにしたものであり、その手段は、音声処
理装置における発声速度推定装置において、音声波を電
気信号に変換する音響電気変換部と、該電気信号をデジ
タル信号に変換する入力部と、該ディジタル信号から特
徴の時系列的抽出を行う特徴抽出部と、5母、音及びI
B音の特徴を予め格納した辞書部と、該特徴抽出部の出
力と該辞書部から読み出された各音の特徴との間で類似
度の時系列的計算を行う類似度計算部と、各時刻におけ
る各音の類似度の最適値の選択を行う最適値選択部と、
該最適値選択部の出力時系列の自己相関を音声区間全体
に亘り計算する自己相関計算部と、該自己相関の最大点
を抽出する最大抽出部と、該最大点の逆数を計算する逆
数計算部とを具備し、該逆数を単位時間当りの発声速度
とすることを特徴とする。
[Means and effects for solving the problems] The present invention is a speech rate estimating device that solves the above-mentioned problems, and which calculates the maximum value time series of the similarity time series of each mora-forming phoneme (5 vowels, plosives). is a mora position time series, the maximum point of the autocorrelation function is extracted as the mora repetition time, and the reciprocal thereof is taken as the speaking rate. an acousto-electric conversion unit that converts audio waves into electrical signals; an input unit that converts the electrical signals into digital signals; a feature extraction unit that extracts features in time series from the digital signals;
a dictionary section that stores B sound features in advance; a similarity calculation section that performs time-series calculation of similarity between the output of the feature extraction section and the features of each sound read from the dictionary section; an optimal value selection unit that selects the optimal value of the similarity of each sound at each time;
an autocorrelation calculation unit that calculates the autocorrelation of the output time series of the optimal value selection unit over the entire speech interval; a maximum extraction unit that extracts the maximum point of the autocorrelation; and a reciprocal calculation unit that calculates the reciprocal of the maximum point. , and the reciprocal number is taken as the speech rate per unit time.

〔実施例〕〔Example〕

第1図は本発明に係る発声速度推定装置の一実施例ブロ
ック線図である。第1図において、音響電気変換部とし
てのマイクロホン1より入力された音声は入力部2でデ
ジタル化され、特徴抽出部3で数IQmsec毎に周波
数分析され約16個程度の帯域内のエネルギが特徴ベク
トルとして抽出される。辞書部4には5母音及び撥音を
上記と同様に特徴抽出した特徴ベクトルが各音素毎に記
憶されている。類似度計算部5では特徴抽出@3で抽出
された特徴ベクトル(x、)と辞書部4内の各音素毎の
特徴ベクトル、 (y、) との類似度(ρ)計算が例えば下記の(1)式の如く行
われる。ここでρAは「ア(A)」との類似度を示す値
である。
FIG. 1 is a block diagram of an embodiment of a speech rate estimating device according to the present invention. In FIG. 1, audio input from a microphone 1 serving as an acoustoelectric transducer is digitized by an input unit 2, and frequency-analyzed by a feature extraction unit 3 every several IQmsec to identify energy characteristics within about 16 bands. Extracted as a vector. The dictionary section 4 stores feature vectors for each phoneme, which are obtained by extracting the features of the five vowels and pellicles in the same manner as described above. The similarity calculation unit 5 calculates the similarity (ρ) between the feature vector (x,) extracted in the feature extraction @3 and the feature vector (y,) for each phoneme in the dictionary unit 4, for example, as shown below ( 1) It is carried out as shown in the formula. Here, ρA is a value indicating the degree of similarity with "A".

以下余白 特徴ベクトル(xi)は分析周期の微小区間として数I
Qmsec度に得られるので類似度ρ1〜ρ9も特徴ベ
クトルが得られる度に計算され、時刻jにおいて、 ρ4.ρj、ρj、ρ4.ρ4.ρj という各音素の類似度時系列が得られる。
Below, the margin feature vector (xi) is a number I as a minute interval of the analysis period.
Since the similarities are obtained in Qmsec degrees, the similarities ρ1 to ρ9 are also calculated every time a feature vector is obtained, and at time j, ρ4. ρj, ρj, ρ4. ρ4. A similarity time series of each phoneme called ρj is obtained.

最適値選択部6では次の計算を行う。The optimum value selection unit 6 performs the following calculation.

ここで、ff1axは引数の最適値として最大値を選択
することを示す。
Here, ff1ax indicates that the maximum value is selected as the optimal value of the argument.

自己相関計算部7では前段で計算したρ、の自己相関関
数を次の式により計算する。
The autocorrelation calculation unit 7 calculates the autocorrelation function of ρ calculated in the previous stage using the following formula.

最大抽出部8では(νk)k−INのに=0以外の最大
のピーク点をに□8として抽出し、つぎの(4)式より
分析周期T (sec)を掛算し1モーラの平均時間長
りを得る。
The maximum extraction unit 8 extracts the maximum peak point of (νk)k-IN other than 0 as □8, and multiplies it by the analysis period T (sec) from the following equation (4) to obtain the average time of 1 mora. gain length.

L = T −k 、、、         −−−−
−−−(4)逆数計算部9では次式の如くLの逆数を計
算し発声速度S(モー97秒)を得る。
L = T −k,,, −−−−
--- (4) The reciprocal calculation section 9 calculates the reciprocal of L as shown in the following equation to obtain the speaking speed S (Mo 97 seconds).

S=1/L           −・−(5)第2図
(a)〜(h)は上述した処理手順の例を説明する図で
iる。例えば「ギンザの(GINZANO) Jを例に
とり、図において縦軸は音声エネルギ値、横軸(D は
時間(例えばlQmsec間隔)を示す。
S=1/L -.-(5) FIGS. 2(a) to 2(h) are diagrams for explaining an example of the above-mentioned processing procedure. For example, taking "GINZANO J" as an example, in the figure, the vertical axis represents the audio energy value, and the horizontal axis (D represents time (for example, 1Qmsec interval).

(a)において時間jにおける類似度ρjAは音声rG
INZANOJの内rAJにおいて類似度が高いことが
わかる。同様にr I J  rUJ−−−−一・−r
NJの類似度が高い場所は波形が(b)〜(f)の山の
ようになることがわかる。これらの類似度計算は前述の
(1)式に基づいて類似度計算部5において行われる。
In (a), the similarity ρjA at time j is voice rG
It can be seen that the degree of similarity is high in rAJ among INZANOJ. Similarly r I J rUJ---1・-r
It can be seen that where the NJ similarity is high, the waveforms look like mountains (b) to (f). These similarity calculations are performed in the similarity calculation section 5 based on the above-mentioned equation (1).

そしてjを1番目からn1番目まで、つまり10m5e
cごとに順次変えて上述の類似度計算を行いr A J
  r I J −−−−−−−・rNJについて最大
値を求める。この計算は(2)式のように示され最適値
選択部6において計算される。この結果は(g)に示す
ような波形となる。このような波形について(3)式に
もとづいて自己相関関数を計算すると(h)に示す波形
が得られる。即ち、ここで縦軸 (ν、)は自己相関関
数値、横軸(k)はずらし量を示し、すらしlkは(g
)に示す波形を時間にだけずらすことを示すものとする
。このようにして順次にだけずらした(g)の波形とに
=0における(g)の波形との自己相関関数を計算する
と(h)に示す波形が得られる。もちろんに=’O1即
ち、ずらさない場合に相関が大となることは当然であり
、k=0を除いて次に最大となる点k maxが得られ
る。逆数計算部9においてこの時のk maxと分析周
期Tとを(4)式に示す如く掛けて1モーラ当りの平均
時間長りを得る。このLの逆数Sを計算することにより
発生速度が得られる。
and j from 1st to n1th, that is, 10m5e
Perform the above similarity calculation by changing each c sequentially. r A J
Find the maximum value for r I J ----------rNJ. This calculation is shown as equation (2) and is calculated in the optimum value selection section 6. The result is a waveform as shown in (g). When an autocorrelation function is calculated based on equation (3) for such a waveform, a waveform shown in (h) is obtained. That is, here, the vertical axis (ν,) shows the autocorrelation function value, the horizontal axis (k) shows the shift amount, and the smoothness lk is (g
) indicates that the waveform shown in ) is shifted only in time. By calculating the autocorrelation function between the waveform (g) shifted only in this way and the waveform (g) at =0, the waveform shown in (h) is obtained. Of course, it is natural that the correlation becomes large when ='O1, that is, when there is no shift, and the next largest point k max is obtained except for k=0. In the reciprocal calculation unit 9, k max at this time is multiplied by the analysis period T as shown in equation (4) to obtain the average time length per mora. By calculating the reciprocal S of this L, the generation rate can be obtained.

〔発明の効果〕〔Effect of the invention〕

本発明によれば陽に音素認識を行うことなく音声の発声
速度を正確に推定することができる。
According to the present invention, it is possible to accurately estimate the speaking rate of speech without explicitly performing phoneme recognition.

【図面の簡単な説明】[Brief explanation of drawings]

第1図は、本発明に係る発声速度推定装置の一実施例ブ
ロック線図、 第2図(a)〜(h)は、第1図装置の処理手順を説明
する図、および 第3図(a)〜(e)は従来の発声速度の推定方法を説
明する図である。 (符号の説明) 1−マイクロホン、 2−人力部、 3・・−特徴抽出部、 4−・−辞書部、 5−・−類似度計算部、 6−最適値選択部、 7・−自己相関計算部、 8・−最大抽出部、 9−逆数計算部。
FIG. 1 is a block diagram of an embodiment of the speech rate estimating device according to the present invention, FIGS. 2(a) to (h) are diagrams explaining the processing procedure of the device in FIG. FIGS. 8A to 8E are diagrams illustrating a conventional method for estimating speech rate. (Explanation of symbols) 1-microphone, 2-human power section, 3...-feature extraction section, 4--dictionary section, 5--similarity calculation section, 6-optimum value selection section, 7.--autocorrelation calculation unit, 8-maximum extraction unit, 9-reciprocal calculation unit.

Claims (1)

【特許請求の範囲】[Claims] 1、音声処理装置における発声速度推定装置において、
音声波を電気信号に変換する音響電気変換部と、該電気
信号をデジタル信号に変換する入力部と、該ディジタル
信号から特徴の時系列的抽出を行う特徴抽出部と、5母
音及び撥音の特徴を予め格納した辞書部と、該特徴抽出
部の出力と該辞書部から読み出された各音の特徴との間
で類似度の時系列的計算を行う類似度計算部と、各時刻
における各音の類似度の最適値の選択を行う最適値選択
部と、該最適値選択部の出力時系列の自己相関を音声区
間全体に亘り計算する自己相関計算部と、該自己相関の
最大点を抽出する最大抽出部と、該最大点の逆数を計算
する逆数計算部とを具備し、該逆数を単位時間当りの発
声速度とすることを特徴とする発声速度推定装置。
1. In a speech rate estimation device in a speech processing device,
an acousto-electric conversion unit that converts a speech wave into an electrical signal; an input unit that converts the electrical signal into a digital signal; a feature extraction unit that extracts features in time series from the digital signal; a similarity calculation unit that performs time-series calculation of similarity between the output of the feature extraction unit and the feature of each sound read from the dictionary unit; an optimal value selection section that selects the optimal value of sound similarity; an autocorrelation calculation section that calculates the autocorrelation of the output time series of the optimal value selection section over the entire speech interval; and an autocorrelation calculation section that calculates the maximum point of the autocorrelation. A speech rate estimating device comprising: a maximum extraction unit that extracts; and a reciprocal calculation unit that calculates a reciprocal of the maximum point; the reciprocal is used as a vocalization rate per unit time.
JP60030183A 1985-02-20 1985-02-20 Enunciation speed estimate apparatus Granted JPS61190400A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP60030183A JPS61190400A (en) 1985-02-20 1985-02-20 Enunciation speed estimate apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP60030183A JPS61190400A (en) 1985-02-20 1985-02-20 Enunciation speed estimate apparatus

Publications (2)

Publication Number Publication Date
JPS61190400A true JPS61190400A (en) 1986-08-25
JPH0588478B2 JPH0588478B2 (en) 1993-12-22

Family

ID=12296643

Family Applications (1)

Application Number Title Priority Date Filing Date
JP60030183A Granted JPS61190400A (en) 1985-02-20 1985-02-20 Enunciation speed estimate apparatus

Country Status (1)

Country Link
JP (1) JPS61190400A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH05289691A (en) * 1992-04-10 1993-11-05 Nippon Telegr & Teleph Corp <Ntt> Speech speed measuring instrument
JP2002221976A (en) * 2001-01-24 2002-08-09 Yamaha Corp Speech speed detecting method and voice signal processor

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH05289691A (en) * 1992-04-10 1993-11-05 Nippon Telegr & Teleph Corp <Ntt> Speech speed measuring instrument
JP2002221976A (en) * 2001-01-24 2002-08-09 Yamaha Corp Speech speed detecting method and voice signal processor

Also Published As

Publication number Publication date
JPH0588478B2 (en) 1993-12-22

Similar Documents

Publication Publication Date Title
Talkin et al. A robust algorithm for pitch tracking (RAPT)
JP2763322B2 (en) Audio processing method
US8889976B2 (en) Musical score position estimating device, musical score position estimating method, and musical score position estimating robot
EP1667108B1 (en) Speech synthesis system, speech synthesis method, and program product
JP2009042716A (en) Cyclic signal processing method, cyclic signal conversion method, cyclic signal processing apparatus, and cyclic signal analysis method
CN104123934A (en) Speech composition recognition method and system
JP2005266797A (en) Method and apparatus for separating sound-source signal and method and device for detecting pitch
US11557287B2 (en) Pronunciation conversion apparatus, pitch mark timing extraction apparatus, methods and programs for the same
JP5325130B2 (en) LPC analysis device, LPC analysis method, speech analysis / synthesis device, speech analysis / synthesis method, and program
JPS61190400A (en) Enunciation speed estimate apparatus
JPH0777979A (en) Speech-operated acoustic modulating device
CN103778914A (en) Anti-noise voice identification method and device based on signal-to-noise ratio weighing template characteristic matching
CN109697985B (en) Voice signal processing method and device and terminal
JP3500690B2 (en) Audio pitch extraction device and audio processing device
JPS58108590A (en) Voice recognition equipment
d’Alessandro et al. Phase-based methods for voice source analysis
JP4313740B2 (en) Reverberation removal method, program, and recording medium
JP4882152B2 (en) Speech speed detection method and audio signal processing apparatus
JPS61128300A (en) Pitch extractor
Razak et al. A preliminary speech analysis for recognizing emotion
JPH03216699A (en) Sound source data generating method of sound synthesizer
CN113450768A (en) Speech synthesis system evaluation method and device, readable storage medium and terminal equipment
Toma et al. Recognition of English vowels in isolated speech using characteristics of Bengali accent
JPH04253100A (en) Sound source data generating method of voice synthesizer
Mamat et al. Mandarin syllables speech trainer based on F1 and F2 formant frequencies

Legal Events

Date Code Title Description
LAPS Cancellation because of no payment of annual fees