JPH0119599B2 - - Google Patents
Info
- Publication number
- JPH0119599B2 JPH0119599B2 JP57034713A JP3471382A JPH0119599B2 JP H0119599 B2 JPH0119599 B2 JP H0119599B2 JP 57034713 A JP57034713 A JP 57034713A JP 3471382 A JP3471382 A JP 3471382A JP H0119599 B2 JPH0119599 B2 JP H0119599B2
- Authority
- JP
- Japan
- Prior art keywords
- monosyllabic
- verification
- speech
- candidates
- matching
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired
Links
- 238000012795 verification Methods 0.000 claims description 31
- 238000000034 method Methods 0.000 claims description 13
- 238000000605 extraction Methods 0.000 description 4
- 238000007781 pre-processing Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000004904 shortening Methods 0.000 description 1
Description
【発明の詳細な説明】
(a) 発明の技術分野
本発明は音声認識装置に係り、特に予め単音節
音声を音声認識装置に登録した話者の未知入力単
音節音声を認識させる単音節音声認識方式に関す
る。[Detailed Description of the Invention] (a) Technical Field of the Invention The present invention relates to a speech recognition device, and particularly to monosyllabic speech recognition that recognizes unknown input monosyllabic speech of a speaker whose monosyllabic speech has been registered in the speech recognition device in advance. Regarding the method.
(b) 技術の背景
近年音声認識技術の向上に伴い、話者の音声を
認識する場合、認識誤りの少い音声認識装置の出
現が望まれている。音声認識方式は主として話者
の単音節音声を予め特徴パラメータに変換して記
憶させておき、未知入力単音節音声の特徴パラメ
ータと予め記憶させた特徴パラメータとを照合し
て最も似ているものを該当する単音節音声として
認識するものであるが、同じ単音節音声でも発声
の仕方では特徴パラメータは変化し、例え同一単
音節音声を何回か、発声方法を変えて登録してお
いても誤りを零にすることは困難である。特に認
識誤りを生じ易い特徴パラメータを有する単音節
音声は照合方法を考慮しないと認識率の向上を計
ることが出来ない。このため予め登録してある総
ての単音節音声と話者の単音節音声とを照合した
後、該照合結果に基づき未知入力単音節音声に最
も似ている単音節音声から順に順次複数の再照合
候補を登録済単音節音声より選出し、該複数の再
照合候補の組合せに応じて定まる再照合パラメー
タにより未知入力単音節音声と該複数の再照合候
補とを再照合して認識率の向上を計る単音節音声
認識方式が提案されている。しかし上記再照合方
式には改善の余地がありその対策が望まれてい
る。(b) Background of the Technology As speech recognition technology has improved in recent years, there has been a desire for a speech recognition device with fewer recognition errors when recognizing a speaker's speech. The speech recognition method mainly converts the speaker's monosyllabic speech into feature parameters and stores them in advance, and then compares the feature parameters of the unknown input monosyllabic speech with the pre-stored feature parameters to find the one that is most similar. It is recognized as a corresponding monosyllabic voice, but even if the same monosyllabic voice is pronounced, the characteristic parameters change depending on the way it is uttered, so even if you register the same monosyllabic voice several times with different utterance methods, errors will occur. It is difficult to reduce to zero. In particular, for monosyllabic speech that has characteristic parameters that are likely to cause recognition errors, it is impossible to improve the recognition rate unless the matching method is taken into consideration. For this reason, after comparing all monosyllabic voices registered in advance with the monosyllabic voice of the speaker, multiple replays are performed in order from the monosyllabic voice that is most similar to the unknown input monosyllabic voice based on the matching results. A matching candidate is selected from registered monosyllabic speech, and the unknown input monosyllabic speech is re-matched with the plurality of re-matching candidates using a re-matching parameter determined according to the combination of the plurality of re-matching candidates to improve the recognition rate. A monosyllabic speech recognition method has been proposed. However, there is room for improvement in the above reverification method, and countermeasures are desired.
(c) 発明の目的
本発明の目的は上記要望に基づき上記再照合方
式の単音節音声認識方式に於て、再照合候補の数
を絞つて再照合に要する時間を短縮すると共に装
置の構成を簡易化し経済性の向上を計るものであ
る。(c) Purpose of the Invention Based on the above-mentioned needs, the purpose of the present invention is to narrow down the number of re-verification candidates, shorten the time required for re-verification, and improve the configuration of the device in the monosyllabic speech recognition method using the re-verification method. The aim is to simplify and improve economic efficiency.
(d) 発明の構成
本発明の構成は予め単音節音声を登録してお
き、未知入力単音節音声の特徴パラメータと予め
登録された総ての単音節音声の特徴パラメータを
DP照合して最も良く似ているものから上位順に
順次複数の再照合候補を該登録済単音節音声より
選別し、該複数の再照合候補の組合せに応じて定
まる再照合パラメータにより未知入力単音節音声
と該再照合候補とを再照合して、その結果最も良
く似ている再照合候補を該当単音節音声として認
識するが、該複数の再照合候補を選別する際に
DP照合時の類似度が第一位の再照合候補か又は
類似度が上位の複数の再照合候補の組合せが予め
定められているものであつた場合は再照合工程を
省略し、前記DP照合時の類似度が第一位の再照
合候補を認識結果として送出し単音節音声認識時
間の短縮と再照合回路の簡易化を計るものであ
る。(d) Structure of the Invention The structure of the present invention is to register monosyllabic speech in advance, and to calculate the characteristic parameters of the unknown input monosyllabic speech and the characteristic parameters of all monosyllabic speech registered in advance.
After DP matching, multiple rematching candidates are selected from the registered monosyllabic speech in descending order of similarity, and unknown input monosyllables are determined by rematching parameters determined according to the combination of the multiple rematching candidates. The speech is re-matched with the re-matching candidates, and the re-matching candidate that is most similar as a result is recognized as the corresponding monosyllabic speech, but when selecting the multiple re-matching candidates,
If the re-matching candidate with the highest degree of similarity at the time of DP matching is the one with the highest degree of similarity, or the combination of multiple re-matching candidates with the highest degree of similarity is predetermined, the re-matching step is omitted and the DP matching is performed. The rematching candidate with the highest similarity in time is sent out as the recognition result, thereby shortening the monosyllabic speech recognition time and simplifying the rematching circuit.
(e) 発明の実施例
図は本発明の一実施例を示す回路のブロツク図
である。先ず話者は予め単音節音声を登録するた
め制御部8の制御により切替部3をパラメータ格
納部4に接続し、単音節音声を入力より加える。
前処理部1は音声レベル調整及びアナログデイジ
タル変換等を行ないパラメータ抽出部2へ送出
し、パラメータ抽出部2は前記単音節音声の特徴
パラメータを抽出しパラメータ格納部4へ格納す
る。次に単音節音声の認識を行なわせるため、話
者は制御部8の制御により切替部3を記憶部5へ
接続し、単音節音声を発声する。前記同様の動作
により前処理部1、パラメータ抽出部2、切替部
3を経て記憶部5へ入つた未知入力単音節音声の
特徴パラメータは制御部8の制御によりパラメー
タ格納部4に格納されている全単音節音声の特徴
パラメータと照合部6に於てDP照合され、該全
単音節音声の特徴パラメータ中で最も良く似た特
徴パラメータを持つ単音節音声が第一位の再照合
候補として選出され、続いて順次複数の再照合候
補が選出され判定部7へ送られる。判定部7では
該第一位の再照合候補又は上位複数の再照合候補
の組合せを、テーブルに予め格納されている候補
と比較し、同一のものが存在した場合は再照合工
程を省略し制御部8を経て前記照合部6で第一位
の再照合候補に選出されたものを認識結果として
選出する。照合部6で選出された第一位の再照合
候補又は上位複数の再照合候補の組合せが予め定
められたもの以外は再照合して認識するため判定
部7より制御部8へ送出される。制御部8は該再
照合候補に相当する特徴パラメータをパラメータ
格納部4より乗算器10へ、記憶部5に入つてい
る未知入力単音節音声の特徴パラメータを乗算器
11へ夫々送出させ、判定部7は該再照合候補に
より定まる再照合パラメータ、即ち再照合候補を
相互に識別するに適した周波数帯域の成分を強調
し、その他の周波数帯域成分を減少させたものを
周波数ウエイト記憶部12より乗算器10,11
へ送出させる。又判定部7は該再照合候補に応じ
て定まる最適の照合区間を決定するパラメータで
ある閾値を閾値記憶部13より再照合部9へ送出
させる。再照合部9は乗算器10,11の出力と
該閾値記憶部13よりの閾値とにより再照合す
る。前記第一候補より順に複数の再照合候補が未
知入力単音節音声と再照合され最も良く似た再照
合候補が認識結果として制御部8より出力へ送出
される。(e) Embodiment of the invention The figure is a block diagram of a circuit showing an embodiment of the invention. First, the speaker connects the switching section 3 to the parameter storage section 4 under the control of the control section 8 in order to register monosyllabic speech in advance, and inputs the monosyllabic speech.
The preprocessing section 1 performs audio level adjustment, analog-to-digital conversion, etc., and sends the result to the parameter extraction section 2. The parameter extraction section 2 extracts characteristic parameters of the monosyllabic speech and stores them in the parameter storage section 4. Next, in order to recognize the monosyllabic speech, the speaker connects the switching section 3 to the storage section 5 under the control of the control section 8, and utters the monosyllabic speech. The characteristic parameters of the unknown input monosyllabic speech that have entered the storage unit 5 via the preprocessing unit 1, parameter extraction unit 2, and switching unit 3 through the same operation as described above are stored in the parameter storage unit 4 under the control of the control unit 8. The feature parameters of all monosyllabic voices are compared with the DP in the matching unit 6, and the monosyllabic voice with the most similar feature parameters among the feature parameters of all monosyllabic voices is selected as the first candidate for re-matching. Then, a plurality of re-verification candidates are sequentially selected and sent to the determination unit 7. The determination unit 7 compares the first re-verification candidate or the combination of the top re-verification candidates with the candidates stored in advance in the table, and if the same candidates exist, the re-verification step is omitted and control is performed. After passing through section 8, the candidate selected as the first re-verification candidate by the collation section 6 is selected as a recognition result. If the first re-verification candidate selected by the collation unit 6 or a combination of the top re-verification candidates is other than a predetermined combination, it is sent from the determination unit 7 to the control unit 8 for re-verification and recognition. The control unit 8 causes the parameter storage unit 4 to send the feature parameters corresponding to the re-verification candidate to the multiplier 10 and the feature parameters of the unknown input monosyllabic speech stored in the storage unit 5 to the multiplier 11, respectively, and sends the feature parameters corresponding to the re-verification candidate to the multiplier 11, 7 multiplies from the frequency weight storage unit 12 a rematching parameter determined by the rematching candidate, that is, a parameter that emphasizes frequency band components suitable for mutually identifying rematching candidates and reduces other frequency band components. Vessels 10, 11
send to. Further, the determination unit 7 causes the threshold value storage unit 13 to send a threshold value, which is a parameter for determining the optimal matching interval determined according to the re-matching candidate, to the re-matching unit 9. The re-verification unit 9 performs re-verification using the outputs of the multipliers 10 and 11 and the threshold value from the threshold storage unit 13. A plurality of re-verification candidates are re-verified with the unknown input monosyllabic speech in order from the first candidate, and the most similar re-verification candidate is outputted from the control unit 8 as a recognition result.
本実施例に於て判定部7の予め定められたテー
ブルに格納されている再照合工程を省略する候補
の一例を述べると、
照合部6に於ける認識率が極めて良いもの、
即ち例えば第一位の再照合候補が下記の如きも
のワの如く子音が/w/で始まる単音節音声
ヤ、ユヨの如く子音が/j/で始まる単音節音
声
第一位と第二位の再照合候補の組合せとして
出現する可能性の少いもの、例えば下記の如き
もの。 In this embodiment, examples of candidates for omitting the re-verification process stored in the predetermined table of the determination unit 7 are: those with extremely high recognition rates in the verification unit 6;
That is, for example, the first rematching candidate is as follows: a monosyllabic voice with a consonant beginning with /w/, such as wa, and a monosyllabic voice with a consonant beginning with /j/, such as yuyo. Combinations that are unlikely to appear as re-verification candidate combinations, such as the following.
バとタの如く子音が/b/と/t/で始まる
単音節音声
ダとカの如く子音が/d/と/k/で始まる
単音節音声
マとハの如く子音が/m/と/h/で始まる単
音節音声
ガとバの如く子音が/g/と/p/で始まる
単音節音声
等である。 Monosyllabic consonants starting with /b/ and /t/ as in ba and ta Monosyllabic consonants starting with /d/ and /k/ as in da and ka Monosyllabic consonants starting with /d/ and /k/ as in ma and ha consonants /m/ and / Monosyllabic sounds starting with h/ These are monosyllabic sounds where the consonants begin with /g/ and /p/, such as ga and ba.
/b/と/t/、/d/と/k/、/m/
と/h/、/g/と/p/は相互に誤る可能性
が少く、該組合せとなる単音節音声は再照合を
行なう必要のない発声とみなして照合部6の結
果を用いるものである。 /b/ and /t/, /d/ and /k/, /m/
, /h/, /g/, and /p/ are less likely to be mistaken for each other, and the results of the matching unit 6 are used because the monosyllabic sounds that form these combinations are regarded as utterances that do not require re-verification. .
再照合の効果が小さいもの
ラの如く子音が/r/で始まる単音節音声/
r/は不安定でバラツキが大きく他の種々の単
音節音声に誤る傾向があり再照合しても効果が
少く現状ではコストパフオーマンスが悪く再照
合を省略した方が有利である。 Monosyllabic sounds where the consonant starts with /r/, such as A, where the effect of rematching is small.
r/ is unstable and has large variations, and tends to be mistaken for various other monosyllabic sounds, so re-verification has little effect.Currently, cost performance is poor and it is more advantageous to omit re-verification.
(f) 発明の効果
以上説明した如く本発明は再照合方式を用いる
単音節音声認識方式に於て、再照合候補の数を絞
つて再照合に要する時間を短縮し、且つ再照合動
作に関連する構成機器を簡易化することが可能で
経済性を向上させることが出来るため、その効果
は大なるものがある。(f) Effects of the Invention As explained above, the present invention reduces the time required for re-verification by narrowing down the number of re-verification candidates in a monosyllabic speech recognition method using a re-verification method, and also improves the performance related to the re-verification operation. This has a great effect because it is possible to simplify the component equipment used in the process and improve economic efficiency.
図は本発明の一実施例を示す回路のブロツク図
である。1は前処理部、2はパラメータ抽出部、
3は切替部、4はパラメータ格納部、5は記憶
部、6は照合部、7は判定部、8は制御部、9は
再照合部、10,11は乗算器、12は周波数ウ
エイト記憶部、13は閾値記憶部である。
The figure is a block diagram of a circuit showing one embodiment of the present invention. 1 is a preprocessing unit, 2 is a parameter extraction unit,
3 is a switching unit, 4 is a parameter storage unit, 5 is a storage unit, 6 is a verification unit, 7 is a determination unit, 8 is a control unit, 9 is a re-verification unit, 10 and 11 are multipliers, 12 is a frequency weight storage unit , 13 is a threshold storage unit.
Claims (1)
単音節音声とを照合した後、該照合結果に基づき
複数の再照合候補を登録済の単音節音声より選出
し、該再照合候補の組合せごとに再照合パラメー
タを選定して未知入力単音節音声と再照合する音
声認識装置に於て、前記再照合候補を選出する際
に、類似度が第一位となつた再照合候補が予め定
められたものである場合および類似度が上位の複
数の再照合候補の組合せが予め定められたもので
ある場合は再照合を省略することを特徴とする単
音節音声認識方式。1. After comparing all previously registered monosyllabic speech with the unknown input monosyllabic speech, multiple re-matching candidates are selected from the registered monosyllabic speech based on the matching results, and a combination of the re-matching candidates is performed. In a speech recognition device that selects rematching parameters for each case and rematches unknown input monosyllabic speech, when selecting the rematching candidates, the rematching candidate with the highest degree of similarity is determined in advance. 1. A monosyllabic speech recognition method characterized in that re-verification is omitted when a combination of re-verification candidates with high similarity is determined in advance.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP57034713A JPS58159600A (en) | 1982-03-05 | 1982-03-05 | Monosyllabic voice recognition system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP57034713A JPS58159600A (en) | 1982-03-05 | 1982-03-05 | Monosyllabic voice recognition system |
Publications (2)
Publication Number | Publication Date |
---|---|
JPS58159600A JPS58159600A (en) | 1983-09-21 |
JPH0119599B2 true JPH0119599B2 (en) | 1989-04-12 |
Family
ID=12421974
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
JP57034713A Granted JPS58159600A (en) | 1982-03-05 | 1982-03-05 | Monosyllabic voice recognition system |
Country Status (1)
Country | Link |
---|---|
JP (1) | JPS58159600A (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS59214099A (en) * | 1983-05-20 | 1984-12-03 | 株式会社日立製作所 | Voice recognition system |
JPS63201699A (en) * | 1987-02-18 | 1988-08-19 | 株式会社日立製作所 | Voice recognition equipment |
-
1982
- 1982-03-05 JP JP57034713A patent/JPS58159600A/en active Granted
Also Published As
Publication number | Publication date |
---|---|
JPS58159600A (en) | 1983-09-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7974843B2 (en) | Operating method for an automated language recognizer intended for the speaker-independent language recognition of words in different languages and automated language recognizer | |
US5862519A (en) | Blind clustering of data with application to speech processing systems | |
US5018201A (en) | Speech recognition dividing words into two portions for preliminary selection | |
US6922668B1 (en) | Speaker recognition | |
US11081115B2 (en) | Speaker recognition | |
US6574596B2 (en) | Voice recognition rejection scheme | |
Dey et al. | Exploiting sequence information for text-dependent speaker verification | |
CN1963918A (en) | Compress of speaker cyclostyle, combination apparatus and method and authentication of speaker | |
JPH0119599B2 (en) | ||
US4790017A (en) | Speech processing feature generation arrangement | |
JP2000099090A (en) | Speaker recognizing method using symbol string | |
JPS58159598A (en) | Monosyllabic voice recognition system | |
JP3098157B2 (en) | Speaker verification method and apparatus | |
JPS58159597A (en) | Monosyllabic voice recognition system | |
JPS58159599A (en) | Monosyllabic voice recognition system | |
JP2980382B2 (en) | Speaker adaptive speech recognition method and apparatus | |
JPS58159590A (en) | Monosyllabic voice recognition system | |
JPS58159595A (en) | Monosyllabic voice recognition system | |
JPH0119598B2 (en) | ||
JPS5934595A (en) | Voice recognition processing system | |
KR100476337B1 (en) | Method of Simi1ar Word Recognition for Speech Recognition Apparatus | |
Petrovska-Delacrétaz et al. | Unsupervised Data-driven Hidden Markov Modeling for Text-dependent Speaker Verification | |
JPS6346496A (en) | Voice recognition equipment | |
JPS607492A (en) | Monosyllable voice recognition system | |
JPS62255999A (en) | Word voice recognition equipment |