JPS6073598A - Voice recognition system - Google Patents

Voice recognition system

Info

Publication number
JPS6073598A
JPS6073598A JP58180247A JP18024783A JPS6073598A JP S6073598 A JPS6073598 A JP S6073598A JP 58180247 A JP58180247 A JP 58180247A JP 18024783 A JP18024783 A JP 18024783A JP S6073598 A JPS6073598 A JP S6073598A
Authority
JP
Japan
Prior art keywords
syllable
length
syllables
input
speech
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP58180247A
Other languages
Japanese (ja)
Inventor
市川 熹
畑岡 信夫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hitachi Ltd
Original Assignee
Hitachi Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hitachi Ltd filed Critical Hitachi Ltd
Priority to JP58180247A priority Critical patent/JPS6073598A/en
Publication of JPS6073598A publication Critical patent/JPS6073598A/en
Pending legal-status Critical Current

Links

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。
(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】 〔発明の利用分野〕 本発明は音声認識方式、特に、音韻単位で連続発声を認
識する方式に関する。
DETAILED DESCRIPTION OF THE INVENTION [Field of Application of the Invention] The present invention relates to a speech recognition method, and particularly to a method for recognizing continuous utterances in units of phonemes.

〔発明の背景〕[Background of the invention]

認識すべき語紮の棟類が多い認識装置では、従来実用化
されているような単語単位のg繊では、標準パターンの
登録の手間や、認識能力の点で実用には問題が多い。そ
こで音韻や音節を単位に認識する技術が注目されている
。音韻ゆ母音と子音からなる。連続音声中の音韻は前後
の音韻にょ9大きく変形する(調音結合)ことが知られ
ている。
In a recognition device that has to recognize a large number of idioms, word-by-word g-strings, which have been put into practical use in the past, have many practical problems in terms of the effort required to register standard patterns and the recognition ability. Therefore, technology that recognizes phonemes and syllables is attracting attention. Phonology Consists of vowels and consonants. It is known that the phonemes in continuous speech are significantly modified by the preceding and succeeding phonemes (articulatory combination).

一般に、子音から母音への影響よシも、母音から子音へ
の影響が大きい。これらの影響を考慮して認識するため
には、前後の音韻が何であるかを考慮した単位を標準パ
ターンとすれば良いが、その組合せは非常に大きなもの
になるため実際的ではない。そこで、相対的に変形の少
ない母音を先ず認識し、認識した母音に挾まれた子音を
、その母音環境の標準パターンを用いて認識を行なう方
式がある。しかしながら、特定の母音では(主にlit
と1ul)%定の子音(主に無声子音)に挾まれた場合
、無声化したシ、脱落することがある(無声化現象)。
In general, the influence of consonants on vowels is greater than the influence of vowels on consonants. In order to recognize these influences by considering them, it is possible to use a standard pattern as a unit that takes into consideration the phonemes before and after, but this is not practical because the combinations would be very large. Therefore, there is a method in which a vowel with relatively little deformation is first recognized, and a consonant sandwiched between the recognized vowels is recognized using a standard pattern of the vowel environment. However, for certain vowels (mainly lit
and 1ul) When it is sandwiched between certain consonants (mainly voiceless consonants), it may be devoiced and dropped (devoicing phenomenon).

母音が無声化すると、有声音の母音とはスペクトル構造
等の特性が変形するため、通常の母音を認識する方法で
は認識が困難となβ、見落すことが多くなる。語中の母
音を見落すと、その前後の子音の認識も困難となシ、連
続した子音、母音、子音の3音韻、あるいは、これらの
関与する2音節を誤g識することになる。
When a vowel becomes unvoiced, its characteristics such as spectral structure change from that of a voiced vowel, making it difficult to recognize using normal vowel recognition methods and often being overlooked. If a vowel in a word is overlooked, it is difficult to recognize the consonants before and after it, and the student may misidentify three consecutive consonants, a vowel, and a consonant, or two syllables related to these.

〔発明の目的〕[Purpose of the invention]

本発明の目的は、連続音声中の無声化したシ脱落した母
音の位置及び促音と撥音の位置を推定する方式を提供す
ることにある。
SUMMARY OF THE INVENTION An object of the present invention is to provide a method for estimating the positions of devoiced and omitted vowels and the positions of consonants and pellicles in continuous speech.

〔発明の概要〕[Summary of the invention]

上記目的を達するために、本発明では、次の事実に注目
する。即ち、(イ)日本語では、原則的に子音と母音が
組となって音節を形成し、音節の生じるリズムはほぼ一
定であること、(ロ)発声者にょシ、また、その時々に
よシ、上記リズムの速度は変るが、協力的な発声におい
ては、その変化や変動はゆるやかであること、(ハ)音
声認識装置の使われ方は基本的にはオンライン使用であ
シ、認識結果の確認は、その場で行ない、誤シは、その
場で訂正すること、に)−音節の長さは、一度に発声す
る声を構成する音節数に関係するが、その傾向には一定
の規則性があること、等の諸点である。これらの事実に
もとづき、次のような手順で無声化母音あるいは脱落母
音の位置を推定する。
In order to achieve the above object, the present invention focuses on the following facts. That is, (a) in Japanese, syllables are basically formed by pairs of consonants and vowels, and the rhythm in which syllables occur is almost constant; (C) The speed of the above rhythm changes, but in cooperative vocalizations, the changes and fluctuations are gradual; (C) The speech recognition device is basically used online, and the recognition results - The length of a syllable is related to the number of syllables that make up the voice uttered at one time, but there is a certain tendency to There are various points such as regularity. Based on these facts, the position of the devoiced vowel or dropped vowel is estimated using the following procedure.

(1)平均的音節長を設定する。(1) Set the average syllable length.

(2) (1)(又は後述する(8))の±30優程鹿
の範囲の音節長の変動許容範囲を仮定して、最初の発声
の音節数を推定する。音節数の推定法には、たとえば同
一発明者にょシすでに出願されている特願昭57−71
230号を用いることができる。
(2) Assuming a permissible variation range of syllable length of ±30 well as in (1) (or (8) described later), estimate the number of syllables to be uttered for the first time. For example, the method for estimating the number of syllables is based on the patent application filed in 1983-1971 by the same inventor.
No. 230 can be used.

(3)入力音声を認識する。認識の方法についても、た
とえば同一発明者にょシ、すでに出願されている実顧昭
54−91283号を用いることができる。
(3) Recognize the input voice. As for the recognition method, it is possible to use, for example, the patent application No. 1983-91283 filed by the same inventor.

(4)認識結果を表示し、確認する。誤シがあれば誤g
R部分を訂正する。訂正の方法は、キーボードからの入
力等様々な手法の利用が可能である。
(4) Display and confirm the recognition results. If there is a mistake, please write it down.
Correct the R part. Various correction methods can be used, such as input from a keyboard.

(5)確認の結果にもとづき、入力の音節数を確定する
(5) Based on the confirmation result, determine the number of input syllables.

(6)入力音声の長さと(5)の結果から、芙際に入力
された音声の音節時間長をめる(全長を音節数で割れば
良い)。
(6) From the length of the input voice and the result of (5), calculate the syllable duration of the voice input at the end (just divide the total length by the number of syllables).

(7) (2)の事実にもとづき、推定音節長を修正す
る。第1図は、発明者Aが約100単語を丁ねいに発声
したときの、単語の音節数と音節長の分布の実測値の例
である。日本語の単語は4音節程度のものが紋も多いか
ら、この時の音節長を平均的な値として用いるとすれば
、この図の例では、たとえば、入力が3音節の場合は、
実測値の約0.85倍、入力が5音節の場合は、約1.
1倍すれば良い。図からもわかるように、同一発声者の
同一音節数の音声でも長さはバラツクので、上記程度の
精度での修正で良い。
(7) Based on the fact in (2), correct the estimated syllable length. FIG. 1 is an example of actual measured values of the distribution of the number of syllables and syllable length of words when inventor A carefully uttered about 100 words. Japanese words have around 4 syllables and many crests, so if we use this syllable length as an average value, in the example in this figure, for example, if the input is 3 syllables,
Approximately 0.85 times the actual measurement value, approximately 1.
Just multiply it by 1. As can be seen from the figure, the lengths of voices with the same number of syllables from the same speaker vary, so corrections with the above-mentioned accuracy are sufficient.

(8)それまでに用いている平均的音節長と、実測でめ
た音節長の重み付き平均をもって新たな平均的音節長と
する。
(8) Set the new average syllable length as the weighted average of the average syllable length used so far and the syllable length obtained through actual measurements.

(9) (2)にもどシ、認識結果にもとづき、推定音
節長を修正しながら、音節数をめる。
(9) Returning to (2), calculate the number of syllables while correcting the estimated syllable length based on the recognition results.

以上の手順の中で、(2)の音節数を推定した後、その
音声の推定平均音節長を推定しく音声区間を推定音節数
で割れば良い)、音声区間中の有声音以外の区間の長さ
を順次調べ、推定平均音節長の1.5倍以上の区間が存
在したとき、その中央に無声化又は脱落した母音が存在
すると仮定する。次にこの区間の前方から7割を超える
区間について、同一の音響特性が継続(たとえばスペク
トル概形が同一あるいは、音声の無い区間の継続など)
した時は、無声化あるいは脱落母音ではなく、促音が存
在するものと仮定する(語尾では無声/脱落とする)。
In the above procedure, after estimating the number of syllables in step (2), the estimated average syllable length of the speech can be estimated by dividing the speech interval by the estimated number of syllables). The length is sequentially checked, and if a section longer than 1.5 times the estimated average syllable length exists, it is assumed that a devoiced or dropped vowel exists in the center of the section. Next, for more than 70% of the section from the front of this section, the same acoustic characteristics continue (for example, the spectrum outline is the same, or there is a continuation of a section without sound)
When this happens, it is assumed that there is a consonant rather than a devoiced or dropped vowel (voiceless/dropped at the end of the word).

以上のような手順によって、使用中に発声速度が変化し
て行っても、それに追従しながら、母音の数、特に無声
化したム脱落した母音の位置を精匿良く推定して行くこ
とが可能となる。
By following the steps described above, even if the speaking rate changes during use, it is possible to accurately estimate the number of vowels, especially the position of voiceless vowels, while following the changes. becomes.

〔発明の実施例〕[Embodiments of the invention]

以下、本発明の一実施例を図をもって説明する。 Hereinafter, one embodiment of the present invention will be described with reference to the drawings.

第2図は本発明の一実施例を説明するブロック図である
FIG. 2 is a block diagram illustrating an embodiment of the present invention.

第2図において、入力端1よ少入力された音声は、短時
間パワー分析部2と、スペクトル分析部3に送られる。
In FIG. 2, a small amount of audio input to the input terminal 1 is sent to a short-time power analysis section 2 and a spectrum analysis section 3.

短時間パワーがめられると、その値が一定以上となる区
間の値をクリップし、短時間パワーパターン用バッファ
・レジスタ5に送ると共に、制御部6にもその値が送ら
れる。制御部では、バワニ値が予め足めた第1の一定値
θ1を超えた時点から、音声の最初の母音区間が始まっ
たものと見なし、以降第二の一定値θ2を切ってパワー
が一定時間以上(たとえば500m5)続いたとき、入
力音声の最後の有声母音区間が終了したものと見なす。
When the short-term power is determined, the value in the section where the value exceeds a certain value is clipped and sent to the short-time power pattern buffer register 5, and the value is also sent to the control unit 6. The control unit considers that the first vowel section of the voice has started from the moment the Bawani value exceeds a first constant value θ1 that has been added in advance, and after that, the power decreases below the second constant value θ2 for a certain period of time. When it continues for more than 500 m5 (for example, 500 m5), it is considered that the last voiced vowel section of the input speech has ended.

この始点から終、Hでの長さをToとする。メモリ7に
は、予め平均的母音長1Gが記録されておシ、制御部は
、このto±30%の範囲の音節がToの内に、何個入
るかをめ、音節数の推定範囲を定めると同時に、推定範
囲内の各音節長τlをめておく。制御部6は、この推定
音節数範囲の矩形パターンを矩形パターン発生部9から
発生し、バックアメモリ5中のパターンとの相互相関係
数を相関部8にょシ取シ、その結果を制御部に取シ込む
。第3図に一例を示す。(a)は、音声/1euchi
soba/の短時間パワーパターンp (t)であり、
(b)は、このパターンと最も相関の高かった矩形パタ
ーン1+(1)である。区間の短測点をToとして、相
関値r、を l T。
Let To be the length from this starting point to the ending point H. The average vowel length 1G is recorded in advance in the memory 7, and the control unit determines how many syllables in this range of to ± 30% are included in To, and calculates the estimated range of the number of syllables. At the same time, each syllable length τl within the estimation range is also determined. The control unit 6 generates a rectangular pattern in this estimated syllable number range from the rectangular pattern generation unit 9, extracts the cross-correlation coefficient with the pattern in the backup memory 5 into the correlation unit 8, and sends the result to the control unit. Intake. An example is shown in FIG. (a) is audio/1euchi
The short-time power pattern p (t) of soba/ is
(b) is the rectangular pattern 1+(1) that had the highest correlation with this pattern. Let the short measurement point of the section be To, and let the correlation value r, be lT.

r+=五Σp (t) ・t +(t)としてめた例が
第4図であシ、上記例の正しい音節数5で相関が最大と
なっている。この例では音節数の推定範囲を平均の±3
0%よシ広め(3〜8)に取シ、その状況が理解しゃす
いように示しであるU第5図は第2の例で音声/ ao
kus37の場合である。母音+01が無声化し、パワ
ーパターンの値が小さくなっていることがわかるが、第
6図に示すように正しい音節数4の相関が最も高い。第
5図(a)の区間Aが長く、この間に母音が無声化か脱
落していることを推定させる。(b)の矩形パルスの↑
部位置もその点を予想させる。実際にこの例ではjul
がこの位置で無声化している。
FIG. 4 shows an example where r+=5Σp(t)·t+(t), and the correlation is maximum at the correct number of syllables in the above example, 5. In this example, the estimated range of syllable counts is ±3 of the average.
0% and wider range (3 to 8), the situation is shown to make it easier to understand.
This is the case of kus37. It can be seen that the vowel +01 has become devoiced and the value of the power pattern has become smaller, but as shown in FIG. 6, the correlation with the correct number of syllables, 4, is the highest. Section A in FIG. 5(a) is long, and it is assumed that the vowel has become devoiced or dropped during this period. ↑ of the rectangular pulse in (b)
The location of the parts also suggests this point. Actually in this example jul
is muted at this position.

なお、制御部6は閾値θlとθ2とは別に、それらよシ
低い閾値θ3とθ4を持ち、語頭の無声音あるときは、
その位置にも無声化/脱落等によシ変形した母音が存在
しているものと推定する。
In addition to the thresholds θl and θ2, the control unit 6 has lower thresholds θ3 and θ4, and when there is an unvoiced sound at the beginning of a word,
It is presumed that there is also a vowel that has been deformed due to devoicing/dropping etc. at that position.

一方スベクトル分析部3に入力された入力信号は、スペ
クトル情報に変換後、バッファメモリ10に格入される
。バッファメモリ10に格納されたスペクトル情報は、
マツチング部11で標準パターンメモリ12中の母音及
び無音、無声摩擦音の無声部等と分析時点毎にマツチン
グが取られマツチング結果が順次制御部6に送られる。
On the other hand, the input signal input to the spectrum analysis section 3 is stored in the buffer memory 10 after being converted into spectrum information. The spectrum information stored in the buffer memory 10 is
A matching unit 11 performs matching with vowels, silences, unvoiced parts of unvoiced fricatives, etc. in the standard pattern memory 12 at each analysis time, and the matching results are sequentially sent to the control unit 6.

制御部6は、この結果と音節位置推定結果(前述)より
、母音又は、促音、あるいは無声化/脱落した母音を判
定する。無声化した母音は原則として111とlulを
仮定するがl kokoro lや1 haha1等と
なる可能性のある組合せでは1.1又は1.1と仮定す
る。
The control unit 6 determines a vowel, a consonant, or a devoiced/dropped vowel based on this result and the syllable position estimation result (described above). In principle, devoiced vowels are assumed to be 111 and lul, but in combinations that may result in l kokoro l, 1 haha1, etc., they are assumed to be 1.1 or 1.1.

このようにして母音候補が定まると、制御部6は、バッ
ファメモリ10中のスペクトル情報列を順次第2のマツ
チング部14に送シ込みながら、推定した母音に狭まれ
た子音の標準パタンを標準バタンメモリ13よシ取シ出
してマツシングし、その結果を制御部6に取シ込み、母
音の推定結果と併合し、音節認識結果として確認部15
に表示する。使用者は、表示結果を見、正しければOK
のキーを、誤っていれば、キーよシ訂正結果を入力する
。制御部6は、確認結果よシ、正しい音節数を得、音声
区間長の情報Toよシその入力音声の平均音節長をめ、
さらに第1図で説明したごとく、その長さを音節数の関
数として修正し、推定音節長to′としてめる。この値
to′と、それまでの平均音節長”ox’t、重み付き
平均として、新たな平均音節長をめる。
Once the vowel candidates are determined in this way, the control unit 6 sequentially sends the spectral information string in the buffer memory 10 to the second matching unit 14, and creates a standard pattern of consonants narrowed to the estimated vowel. It is taken out from the baton memory 13 and mated, the result is taken into the control unit 6, merged with the vowel estimation result, and confirmed by the confirmation unit 15 as a syllable recognition result.
to be displayed. The user looks at the displayed result and if it is correct, it is OK.
If the key is incorrect, enter the key correction result. The control unit 6 obtains the correct number of syllables based on the confirmation result, determines the average syllable length of the input speech based on the information on the speech interval length,
Furthermore, as explained in FIG. 1, the length is corrected as a function of the number of syllables, and is determined as the estimated syllable length to'. A new average syllable length is calculated by combining this value to', the previous average syllable length "ox't," and a weighted average.

to−αto+βto′ α十β=1 この結果、使用しながら除徐に発声速度が変化しても、
それに追従しながら、平均音節長をめて行くことができ
る。
to−αto+βto′ α×β=1 As a result, even if the speaking speed gradually changes while using the
By following this, you can calculate the average syllable length.

実測結果によれば、極端な列外を除いて、順次発声する
音声の音節時間長は、その時点までの重みつき平均時間
長の±30チ以上には変動しない(はとんどは20条以
内)から、入力波形の短時間パワーパターンと相関を取
る矩形波は、重み付き平均時間長の±30%以内の範囲
を対象とすれば良い。これにより、処理時間の短縮だけ
でなく、音節時間長の推定ibも大幅に減少し、その結
果、認識性能が大幅に向上し、使いやすい音声認識装置
を得ることができる。
According to actual measurement results, the syllable duration of sequentially uttered voices does not fluctuate more than ±30 degrees of the weighted average duration up to that point, except for extreme out-of-sequence cases (mostly in the 20th section). Therefore, the rectangular wave that correlates with the short-time power pattern of the input waveform may cover a range within ±30% of the weighted average time length. As a result, not only the processing time is shortened, but also the estimated syllable duration ib is significantly reduced, and as a result, the recognition performance is significantly improved, and a speech recognition device that is easy to use can be obtained.

修正した平均音節長toはメモリ7に格納され、次の入
力に用いられる。また確認された認識結果は出力部16
を経て端子17よシ出力される。
The corrected average syllable length to is stored in the memory 7 and used for the next input. Also, the confirmed recognition results are output to the output section 16.
The signal is then output from terminal 17.

なお、二重母音化や撥音ンの影響等で音節数の推定値が
±1程度の識差を生じることがあるが、これによる音節
長の推定誤差の大部分は25%以内でろ9、無声化や脱
落の母音や促音の推定には障害とはならない。
Note that the estimated number of syllables may have a difference of about ±1 due to diphthongization and the effects of phlegmatic sounds, but most of the errors in estimating syllable length due to this are within 25%9. This does not pose an obstacle to estimating omitted vowels or consonants.

〔発明の効果〕〔Effect of the invention〕

以上説明したごとく、本発明によれば、使用中に徐々に
変動する発声速度に追従しながら、無声化や脱落した母
音の位置を推定することが可能となシ、認識能力が高く
、使いやすい音声認識方式を提供することができる。
As explained above, according to the present invention, it is possible to estimate the position of devoiced or dropped vowels while following the speech rate that gradually changes during use, and it has high recognition ability and is easy to use. A voice recognition method can be provided.

【図面の簡単な説明】[Brief explanation of the drawing]

第1図は、音声構成音節数と音節時間長の関係を説明す
る図、第2図は本発明の詳細な説明するブロック図、第
3〜第6図、音節数の推定及び無声化位置の推定を説明
するための例を示す図である。 6・・・制御部。
Fig. 1 is a diagram explaining the relationship between the number of syllables constituting speech and the syllable duration, Fig. 2 is a block diagram explaining the present invention in detail, and Figs. 3 to 6 show estimation of the number of syllables and devoicing position. It is a figure showing an example for explaining estimation. 6...Control unit.

Claims (1)

【特許請求の範囲】 1、平均的な音節兼情報と、前記皆節長情報ともとに、
入力された音声を構成する音節長を推定する手段と、入
力音声の認識結果を確認する手段を有し、前記確認紹釆
から得られる入力音声の音節数から前記入力音声の入力
音部長をめ、前記入力音声の入力音部長によシ前記平均
的音ft6長* ib正し、次の入力音声の平均的音節
表情報として用いることを特徴とする音声認識方式。 2、特許請求の範囲第1項記載の音声認識方式において
、推定した音節兼情報を用い、無声化母音と脱落母音、
促音、撥音の位置を推定することを特徴とする音声認識
方式。
[Claims] 1. Along with the average syllable information and the all syllable length information,
It has means for estimating the syllable length constituting the input speech, and means for confirming the recognition result of the input speech, and estimates the input sound length of the input speech from the number of syllables of the input speech obtained from the confirmation introduction button. , the average sound ft6 length*ib is corrected according to the input sound length of the input sound, and is used as average syllable table information of the next input sound. 2. In the speech recognition method described in claim 1, using the estimated syllable-cum-information, devoiced vowels and dropped vowels,
A speech recognition method that is characterized by estimating the positions of consonants and consonants.
JP58180247A 1983-09-30 1983-09-30 Voice recognition system Pending JPS6073598A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP58180247A JPS6073598A (en) 1983-09-30 1983-09-30 Voice recognition system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP58180247A JPS6073598A (en) 1983-09-30 1983-09-30 Voice recognition system

Publications (1)

Publication Number Publication Date
JPS6073598A true JPS6073598A (en) 1985-04-25

Family

ID=16079935

Family Applications (1)

Application Number Title Priority Date Filing Date
JP58180247A Pending JPS6073598A (en) 1983-09-30 1983-09-30 Voice recognition system

Country Status (1)

Country Link
JP (1) JPS6073598A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS6352200A (en) * 1986-08-22 1988-03-05 株式会社日立製作所 Voice recognition equipment

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS6352200A (en) * 1986-08-22 1988-03-05 株式会社日立製作所 Voice recognition equipment

Similar Documents

Publication Publication Date Title
US6304844B1 (en) Spelling speech recognition apparatus and method for communications
JP4085130B2 (en) Emotion recognition device
US7783484B2 (en) Apparatus for reducing spurious insertions in speech recognition
US20090313016A1 (en) System and Method for Detecting Repeated Patterns in Dialog Systems
JPS58102299A (en) Partial unit voice pattern generator
CN107610691B (en) English vowel sounding error correction method and device
JP3311460B2 (en) Voice recognition device
JP4953767B2 (en) Speech generator
WO1997040491A1 (en) Method and recognizer for recognizing tonal acoustic sound signals
JP5754141B2 (en) Speech synthesis apparatus and speech synthesis program
Digalakis et al. Large vocabulary continuous speech recognition in greek: corpus and an automatic dictation system.
JPH11184491A (en) Voice recognition device
JP4239479B2 (en) Speech recognition apparatus, speech recognition method, and speech recognition program
JP2008026721A (en) Speech recognizer, speech recognition method, and program for speech recognition
JP2010060846A (en) Synthesized speech evaluation system and synthesized speech evaluation method
JPS6073598A (en) Voice recognition system
Blomberg Synthetic phoneme prototypes in a connected-word speech recognition system
JPS60129796A (en) Sillable boundary detection system
JP2006010739A (en) Speech recognition device
JP4313724B2 (en) Audio reproduction speed adjustment method, audio reproduction speed adjustment program, and recording medium storing the same
JP2001331191A (en) Device and method for voice synthesis, portable terminal and program recording medium
CN111383620B (en) Audio correction method, device, equipment and storage medium
Peng et al. An innovative prosody modeling method for Chinese speech recognition
Gibson et al. Speech signal processing
Sinha et al. Exploring the role of pitch-adaptive cepstral features in context of children's mismatched ASR