JPH0119598B2 - - Google Patents

Info

Publication number
JPH0119598B2
JPH0119598B2 JP57022358A JP2235882A JPH0119598B2 JP H0119598 B2 JPH0119598 B2 JP H0119598B2 JP 57022358 A JP57022358 A JP 57022358A JP 2235882 A JP2235882 A JP 2235882A JP H0119598 B2 JPH0119598 B2 JP H0119598B2
Authority
JP
Japan
Prior art keywords
monosyllabic
speech
matching
feature parameters
candidates
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired
Application number
JP57022358A
Other languages
Japanese (ja)
Other versions
JPS58159589A (en
Inventor
Yasuo Sato
Takayuki Ooyama
Takayuki Fujimoto
Tadayasu Sugita
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujitsu Ltd
Original Assignee
Fujitsu Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujitsu Ltd filed Critical Fujitsu Ltd
Priority to JP57022358A priority Critical patent/JPS58159589A/en
Publication of JPS58159589A publication Critical patent/JPS58159589A/en
Publication of JPH0119598B2 publication Critical patent/JPH0119598B2/ja
Granted legal-status Critical Current

Links

Description

【発明の詳細な説明】 (a) 発明の技術分野 本発明は音声認識装置に係り、特に予め単音節
音声を音声認識装置に登録した話者の入力音声を
高い認識率で認識させる単音節音声認識方式に関
する。
[Detailed Description of the Invention] (a) Technical Field of the Invention The present invention relates to a speech recognition device, and particularly to a monosyllabic speech recognition device that recognizes input speech of a speaker whose monosyllabic speech has been registered in advance in the speech recognition device with a high recognition rate. Regarding recognition method.

(b) 技術の背景 近年音声認識技術の向上に伴い、話者の音声を
認識する場合、誤りの少い音声認識装置の出現が
望まれている。音声認識方式は主として話者の単
音節音声を予め特徴パラメータに変換して記憶さ
せておき、未知入力単音節音声の特徴パラメータ
と予め記憶された特徴パラメータとを照合し最も
似ているものを該当する単音節音声として認識す
るものであるが、同じ単音節音声でも発声の仕方
で特徴パラメータは変化し、例え同一単音節音声
を何回か発声方法を変えて登録しておいても誤り
を零にすることは困難である。特に認識誤りを生
じ易い特徴パラメータを有する単音節音声は照合
方法を考慮しないと認識率の向上を計ることが出
来ない。
(b) Background of the Technology As speech recognition technology has improved in recent years, there has been a desire for a speech recognition device that makes fewer errors when recognizing a speaker's voice. Speech recognition methods mainly convert the speaker's monosyllabic speech into feature parameters and store them in advance, and then match the feature parameters of the unknown input monosyllabic speech with the pre-stored feature parameters to select the most similar one. However, even if the same monosyllabic voice is pronounced, the characteristic parameters change depending on the way it is uttered, so even if the same monosyllabic voice is registered several times with different utterance methods, there will be no errors. It is difficult to do so. In particular, for monosyllabic speech that has characteristic parameters that are likely to cause recognition errors, it is impossible to improve the recognition rate unless the matching method is taken into consideration.

(c) 発明の目的 本発明の目的は上記要望に基き単音節音声認識
方式に於て、入力単音節音声の特徴パラメータを
予め登録しておいた特徴パラメータと比較照合し
て最も似ているものから順次複数の候補を選出
し、該複数の候補の組合せごとに再照合パラメー
タを変化させて未知入力単音節音声の特徴パラメ
ータと再照合を行ない、最も似ているものを該当
する単音節音声として認識する様にして音声認識
率の高い音声認識装置を提供することにある。又
再照合の方式は始めの照合と同一方法であるDP
照合で行なう事で装置の構成を容易にしたもので
ある。
(c) Purpose of the Invention Based on the above-mentioned request, the purpose of the present invention is to compare and match the feature parameters of input monosyllabic speech with pre-registered feature parameters in a monosyllabic speech recognition method to find the most similar feature parameter. Select multiple candidates sequentially from , change the re-matching parameters for each combination of the multiple candidates, perform re-matching with the feature parameters of the unknown input monosyllabic speech, and select the most similar one as the corresponding monosyllabic speech. To provide a speech recognition device with a high speech recognition rate. Also, the method of re-verification is the same as the initial verification, which is DP.
This is done through verification, which simplifies the configuration of the device.

(d) 発明の構成 本発明の構成は予め単音節音声を登録してお
き、未知入力単音節音声の特徴パラメータと予め
登録された総ての単音節音声の特徴パラメータを
DP照合して最も似ているものから上位順に順次
複数の再照合候補を該登録済単音節音声より選別
し、該複数の再照合候補の組合せにより定まる再
照合パラメータにより未知入力単音節音声と該再
照合候補とを再照合して、その結果最も似ている
再照合候補を該当単音節音声と認識して送出する
ものであるが、前記再照合パラメータは周波数ス
ペクトルに特徴があるため単音節音声の周波数ス
ペクトルにウエイトをかけたものを用いるもので
ある。即ち或周波数帯域の成分を強調し、その他
の周波数成分は減少させたものである。又更に子
音には発声時のレベルに差があり音声と雑音とを
区分するための閾値を変えることで認識率が向上
するものがあり、雑音を拾う危険があつても閾値
を下げた方が良いものと、逆に閾値を上げて雑音
を拾わない方が良いものとがある。以上の条件を
選出された候補により夫々使い分けて再照合する
ものである。
(d) Structure of the Invention The structure of the present invention is to register monosyllabic speech in advance, and to calculate the characteristic parameters of the unknown input monosyllabic speech and the characteristic parameters of all monosyllabic speech registered in advance.
After DP matching, multiple rematching candidates are selected from the registered monosyllabic speech in descending order of similarity, and the unknown input monosyllabic speech is matched with the unknown input monosyllabic speech using the rematching parameter determined by the combination of the multiple rematching candidates. As a result, the most similar rematching candidate is recognized as the corresponding monosyllabic speech and sent out. However, since the rematching parameter has characteristics in the frequency spectrum, It uses a weighted frequency spectrum of . That is, components in a certain frequency band are emphasized and other frequency components are reduced. Furthermore, some consonants have different levels when uttered, and the recognition rate can be improved by changing the threshold for distinguishing between speech and noise, so it is better to lower the threshold even if there is a risk of picking up noise. There are some that are good, and others that require raising the threshold to avoid picking up noise. The above conditions are used differently depending on the selected candidates and re-verified.

(e) 発明の実施例 図は本発明の一実施例を示す回路のブロツク図
である。
(e) Embodiment of the invention The figure is a block diagram of a circuit showing an embodiment of the invention.

先ず話者は予め単音節音声を登録するため制御
部12の制御により切替部3をパラメータ格納部
4に接続し、単音節音声を入力より加える。前処
理部1は音声レベル調整及びアナログデイジタル
変換等を行ないパラメータ抽出部2へ送出し、特
徴パラメータを抽出してパラメータ格納部4へ格
納する。次に音声認識を行なわせるため切替部3
を記憶部5へ接続し発声する。制御部12の制御
によりパラメータ格納部4から格納されている全
特徴パラメータが読出され照合部8に於て、記憶
部5よりの特徴パラメータと照合される。パラメ
ータ格納部4より読出された特徴パラメータで記
憶部5よりの特徴パラメータに最も良く似た特徴
パラメータに該当する単音節音声の候補を第1と
し順次複数の候補を制御部12は選定し、該候補
に相当する特徴パラメータをパラメータ格納部4
より乗算器9へ第1候補より入力する。同時に記
憶部5の特徴パラメータを乗算器10に入力し、
周波数ウエイト記憶部6より、前記複数候補の組
合せにより定まる周波数ウエイト、即ち周波スペ
クトルにウエイトをかけ或る周波数帯域の成分を
強調し、その他の周波数帯域の成分は減少させた
ものを乗算器9,10へ送る。又更に前記複数候
補の組合せにより閾値の変更を要するものは閾値
記憶部7より最適な閾値を再照合部11へ送る。
再照合部11は乗算器9,10の出力と閾値記憶
部7よりの閾値により再照合する。第1候補より
順に複数の候補を再照合して最も似た特徴パラメ
ータに該当する単音節音声を認識結果として制御
部12を経て出力より送出する。上記動作は話者
の単音節音声発声の都度繰り返され、その都度前
記再照合候補が選出され、同一単音節音声であつ
ても必ずしも再照合候補は同一組合せとはなら
ず、従つて再照合パラメータも再照合候補の組合
せごとに選定される。
First, the speaker connects the switching section 3 to the parameter storage section 4 under the control of the control section 12 in order to register monosyllabic speech in advance, and inputs the monosyllabic speech. The preprocessing section 1 performs audio level adjustment, analog-to-digital conversion, etc., and sends it to the parameter extraction section 2, which extracts characteristic parameters and stores them in the parameter storage section 4. Next, in order to perform voice recognition, the switching unit 3
is connected to the storage unit 5 and uttered. All the feature parameters stored in the parameter storage section 4 are read out under the control of the control section 12 and are compared with the feature parameters from the storage section 5 in the comparison section 8 . The control unit 12 sequentially selects a plurality of monosyllabic speech candidates corresponding to the feature parameters read from the parameter storage unit 4 that are most similar to the feature parameters from the storage unit 5 as the first candidate, and The feature parameters corresponding to the candidates are stored in the parameter storage unit 4.
The first candidate is then input to the multiplier 9. At the same time, the feature parameters in the storage unit 5 are input to the multiplier 10,
From the frequency weight storage unit 6, the frequency weight determined by the combination of the plurality of candidates, that is, the frequency spectrum is weighted to emphasize the components of a certain frequency band, and the components of other frequency bands are reduced, is applied to the multiplier 9, Send to 10. Furthermore, if the threshold value needs to be changed due to a combination of the plurality of candidates, the optimal threshold value is sent from the threshold storage section 7 to the re-verification section 11.
The re-verification unit 11 performs re-verification using the outputs of the multipliers 9 and 10 and the threshold value from the threshold value storage unit 7. A plurality of candidates are re-verified in order from the first candidate, and the monosyllabic speech corresponding to the most similar feature parameters is output as a recognition result via the control unit 12. The above operation is repeated each time the speaker utters a monosyllabic voice, and the re-matching candidates are selected each time.Even if the same monosyllabic voice is used, the re-matching candidates do not necessarily have the same combination, so the re-matching parameters are also selected for each combination of re-verification candidates.

本実施例に於て、サの如く子音が/s/で始ま
る単音節音声とタの如き子音が/t/で始まる単
音節音声の相互間を判別する場合、その判別誤り
を少くするには、該単音節音声の周波数スペクト
ル情報より約250Hz〜600Hzと3400Hz〜4500Hzの範
囲の成分を強調し、他の周波数成分は減少させ、
且つ音声始端検出レベルの閾値を低くすることに
よりすぐれた効果を得た。又バとパの如く子音
が/b/と/p/で始まる単音節音声、ザとサの
如く子音が/z/と/s/で始まる単音節音声は
約250Hz〜1300Hzの範囲の成分を強調することで、
ダとザの如く/d/、/z/で始まる単音節音声
は音声始端検出レベルの閾値を高くし音声始端の
雑音を除くことにより上記同様にすぐれた効果を
得た。
In this example, when discriminating between monosyllabic sounds such as sa where the consonant starts with /s/ and monosyllabic sounds such as ta where the consonant starts with /t/, how to reduce the discrimination error. , emphasizes components in the range of approximately 250Hz to 600Hz and 3400Hz to 4500Hz from frequency spectrum information of the monosyllabic speech, and reduces other frequency components,
In addition, excellent effects were obtained by lowering the threshold of the voice start detection level. In addition, monosyllabic speech whose consonants begin with /b/ and /p/, such as ba and pa, and monosyllabic speech whose consonants begin with /z/ and /s/, such as za and sa, have components in the range of approximately 250Hz to 1300Hz. By emphasizing
For monosyllabic speech starting with /d/ and /z/, such as da and za, the same excellent effect as above was obtained by increasing the threshold of the speech start detection level and removing the noise at the speech start.

(f) 発明の効果 以上説明した如く本発明は各単音節音声固有の
特徴パラメータが有する周波数スペクトルの特徴
を抽出して記憶させておき話者の単音節音声の特
徴パラメータを該周波数スペクトルの特徴により
再照合することで認識誤りを大幅に低下させるこ
とが出来るため、その効果は大なるものがある。
又照合の方式は制御部によるDP照合のため照合
部と再照合部は同じものでよく装置の構成も容易
である。
(f) Effects of the Invention As explained above, the present invention extracts and stores the frequency spectrum features of the characteristic parameters unique to each monosyllabic voice, and extracts and stores the frequency spectrum characteristics of the characteristic parameters of the monosyllabic voice of the speaker. Re-verification can greatly reduce recognition errors, which has a great effect.
Furthermore, since the verification method is DP verification by the control section, the verification section and re-verification section are the same, and the configuration of the device is easy.

【図面の簡単な説明】[Brief explanation of drawings]

図は本発明の一実施例を示す回路のブロツク図
である。 1は前処理部、2はパラメータ抽出部、3は切
替部、4はパラメータ格納部、5は記憶部、6は
周波数ウエイト記憶部、7は閾値記憶部、8は照
合部、9,10は乗算器、11は再照合部、12
は制御部である。
The figure is a block diagram of a circuit showing one embodiment of the present invention. 1 is a preprocessing section, 2 is a parameter extraction section, 3 is a switching section, 4 is a parameter storage section, 5 is a storage section, 6 is a frequency weight storage section, 7 is a threshold storage section, 8 is a collation section, 9 and 10 are Multiplier, 11 is re-verification unit, 12
is the control section.

Claims (1)

【特許請求の範囲】 1 あらかじめ登録された単音節音声の特徴パラ
メータと未知入力単音節音声の特徴パラメータと
を照合して単音節音声を認識する音声認識装置に
於て、登録されたすべての単音節音声と未知入力
単音節音声とを照合した後、該照合結果に基き複
数の再照合候補を登録済みの単音節音声より選定
し、該再照合候補の組合せごとに再照合パラメー
タを選定して再照合する手段を設け単音節音声の
認識を行なう事を特徴とする単音節音声認識方
式。 2 単音節音声の周波数スペクトルに周波数ウエ
イトを乗じて得られる特徴パラメータおよび照合
区間を決定するための音声検出レベルの閾値を可
変にして得られる特徴パラメータのうちの少なく
とも一方の特徴パラメータを上記再照合パラメー
タとして選定することを特徴とする特許請求の範
囲第1項に記載の単音節音声認識方式。
[Claims] 1. In a speech recognition device that recognizes monosyllabic speech by comparing feature parameters of monosyllabic speech registered in advance with feature parameters of unknown input monosyllabic speech, all registered monosyllabic speech After matching the syllable speech with the unknown input monosyllabic speech, a plurality of re-matching candidates are selected from the registered monosyllabic sounds based on the matching results, and a re-matching parameter is selected for each combination of the re-matching candidates. A monosyllabic speech recognition method characterized by providing a re-verification means to recognize monosyllabic speech. 2. Re-matching at least one of the feature parameters obtained by multiplying the frequency spectrum of the monosyllabic speech by the frequency weight and the feature parameters obtained by varying the threshold of the speech detection level for determining the matching interval. The monosyllabic speech recognition method according to claim 1, wherein the monosyllabic speech recognition method is selected as a parameter.
JP57022358A 1982-02-15 1982-02-15 Monosyllabic voice recognition system Granted JPS58159589A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP57022358A JPS58159589A (en) 1982-02-15 1982-02-15 Monosyllabic voice recognition system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP57022358A JPS58159589A (en) 1982-02-15 1982-02-15 Monosyllabic voice recognition system

Publications (2)

Publication Number Publication Date
JPS58159589A JPS58159589A (en) 1983-09-21
JPH0119598B2 true JPH0119598B2 (en) 1989-04-12

Family

ID=12080407

Family Applications (1)

Application Number Title Priority Date Filing Date
JP57022358A Granted JPS58159589A (en) 1982-02-15 1982-02-15 Monosyllabic voice recognition system

Country Status (1)

Country Link
JP (1) JPS58159589A (en)

Also Published As

Publication number Publication date
JPS58159589A (en) 1983-09-21

Similar Documents

Publication Publication Date Title
US6922668B1 (en) Speaker recognition
US6574596B2 (en) Voice recognition rejection scheme
Soleymani et al. Prosodic-enhanced siamese convolutional neural networks for cross-device text-independent speaker verification
US11081115B2 (en) Speaker recognition
US7085718B2 (en) Method for speaker-identification using application speech
JPH0119598B2 (en)
JP2001350494A (en) Device and method for collating
JP2001265387A (en) Speaker collating device and method
JPH0119599B2 (en)
JPS58159598A (en) Monosyllabic voice recognition system
JPS58159590A (en) Monosyllabic voice recognition system
JPS58159599A (en) Monosyllabic voice recognition system
JPS63213899A (en) Speaker collation system
JPS58159597A (en) Monosyllabic voice recognition system
JP2891259B2 (en) Voice section detection device
JP2844592B2 (en) Discrete word speech recognition device
JPS63798B2 (en)
JPS58159591A (en) Monosyllabic voice recognition system
JPS6131878B2 (en)
JPH0469959B2 (en)
JPS62255999A (en) Word voice recognition equipment
JPS607492A (en) Monosyllable voice recognition system
JPH09297596A (en) Voice recognization device
JPH0352085A (en) Speaker collating system using self-organizing network
JPS6350898A (en) Voice recognition equipment