JPH0119598B2

JPH0119598B2 -

Info

Publication number: JPH0119598B2
Application number: JP57022358A
Authority: JP
Inventors: Yasuo Sato; Takayuki Ooyama; Takayuki Fujimoto; Tadayasu Sugita
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1982-02-15
Filing date: 1982-02-15
Publication date: 1989-04-12
Also published as: JPS58159589A

Description

【発明の詳細な説明】 (a) 発明の技術分野本発明は音声認識装置に係り、特に予め単音節
音声を音声認識装置に登録した話者の入力音声を
高い認識率で認識させる単音節音声認識方式に関
する。[Detailed Description of the Invention] (a) Technical Field of the Invention The present invention relates to a speech recognition device, and particularly to a monosyllabic speech recognition device that recognizes input speech of a speaker whose monosyllabic speech has been registered in advance in the speech recognition device with a high recognition rate. Regarding recognition method.

(b) 技術の背景近年音声認識技術の向上に伴い、話者の音声を
認識する場合、誤りの少い音声認識装置の出現が
望まれている。音声認識方式は主として話者の単
音節音声を予め特徴パラメータに変換して記憶さ
せておき、未知入力単音節音声の特徴パラメータ
と予め記憶された特徴パラメータとを照合し最も
似ているものを該当する単音節音声として認識す
るものであるが、同じ単音節音声でも発声の仕方
で特徴パラメータは変化し、例え同一単音節音声
を何回か発声方法を変えて登録しておいても誤り
を零にすることは困難である。特に認識誤りを生
じ易い特徴パラメータを有する単音節音声は照合
方法を考慮しないと認識率の向上を計ることが出
来ない。(b) Background of the Technology As speech recognition technology has improved in recent years, there has been a desire for a speech recognition device that makes fewer errors when recognizing a speaker's voice. Speech recognition methods mainly convert the speaker's monosyllabic speech into feature parameters and store them in advance, and then match the feature parameters of the unknown input monosyllabic speech with the pre-stored feature parameters to select the most similar one. However, even if the same monosyllabic voice is pronounced, the characteristic parameters change depending on the way it is uttered, so even if the same monosyllabic voice is registered several times with different utterance methods, there will be no errors. It is difficult to do so. In particular, for monosyllabic speech that has characteristic parameters that are likely to cause recognition errors, it is impossible to improve the recognition rate unless the matching method is taken into consideration.

(c) 発明の目的本発明の目的は上記要望に基き単音節音声認識
方式に於て、入力単音節音声の特徴パラメータを
予め登録しておいた特徴パラメータと比較照合し
て最も似ているものから順次複数の候補を選出
し、該複数の候補の組合せごとに再照合パラメー
タを変化させて未知入力単音節音声の特徴パラメ
ータと再照合を行ない、最も似ているものを該当
する単音節音声として認識する様にして音声認識
率の高い音声認識装置を提供することにある。又
再照合の方式は始めの照合と同一方法であるDP
照合で行なう事で装置の構成を容易にしたもので
ある。(c) Purpose of the Invention Based on the above-mentioned request, the purpose of the present invention is to compare and match the feature parameters of input monosyllabic speech with pre-registered feature parameters in a monosyllabic speech recognition method to find the most similar feature parameter. Select multiple candidates sequentially from , change the re-matching parameters for each combination of the multiple candidates, perform re-matching with the feature parameters of the unknown input monosyllabic speech, and select the most similar one as the corresponding monosyllabic speech. To provide a speech recognition device with a high speech recognition rate. Also, the method of re-verification is the same as the initial verification, which is DP.
This is done through verification, which simplifies the configuration of the device.

(d) 発明の構成本発明の構成は予め単音節音声を登録してお
き、未知入力単音節音声の特徴パラメータと予め
登録された総ての単音節音声の特徴パラメータを
DP照合して最も似ているものから上位順に順次
複数の再照合候補を該登録済単音節音声より選別
し、該複数の再照合候補の組合せにより定まる再
照合パラメータにより未知入力単音節音声と該再
照合候補とを再照合して、その結果最も似ている
再照合候補を該当単音節音声と認識して送出する
ものであるが、前記再照合パラメータは周波数ス
ペクトルに特徴があるため単音節音声の周波数ス
ペクトルにウエイトをかけたものを用いるもので
ある。即ち或周波数帯域の成分を強調し、その他
の周波数成分は減少させたものである。又更に子
音には発声時のレベルに差があり音声と雑音とを
区分するための閾値を変えることで認識率が向上
するものがあり、雑音を拾う危険があつても閾値
を下げた方が良いものと、逆に閾値を上げて雑音
を拾わない方が良いものとがある。以上の条件を
選出された候補により夫々使い分けて再照合する
ものである。(d) Structure of the Invention The structure of the present invention is to register monosyllabic speech in advance, and to calculate the characteristic parameters of the unknown input monosyllabic speech and the characteristic parameters of all monosyllabic speech registered in advance.
After DP matching, multiple rematching candidates are selected from the registered monosyllabic speech in descending order of similarity, and the unknown input monosyllabic speech is matched with the unknown input monosyllabic speech using the rematching parameter determined by the combination of the multiple rematching candidates. As a result, the most similar rematching candidate is recognized as the corresponding monosyllabic speech and sent out. However, since the rematching parameter has characteristics in the frequency spectrum, It uses a weighted frequency spectrum of . That is, components in a certain frequency band are emphasized and other frequency components are reduced. Furthermore, some consonants have different levels when uttered, and the recognition rate can be improved by changing the threshold for distinguishing between speech and noise, so it is better to lower the threshold even if there is a risk of picking up noise. There are some that are good, and others that require raising the threshold to avoid picking up noise. The above conditions are used differently depending on the selected candidates and re-verified.

(e) 発明の実施例図は本発明の一実施例を示す回路のブロツク図
である。(e) Embodiment of the invention The figure is a block diagram of a circuit showing an embodiment of the invention.

先ず話者は予め単音節音声を登録するため制御
部１２の制御により切替部３をパラメータ格納部
４に接続し、単音節音声を入力より加える。前処
理部１は音声レベル調整及びアナログデイジタル
変換等を行ないパラメータ抽出部２へ送出し、特
徴パラメータを抽出してパラメータ格納部４へ格
納する。次に音声認識を行なわせるため切替部３
を記憶部５へ接続し発声する。制御部１２の制御
によりパラメータ格納部４から格納されている全
特徴パラメータが読出され照合部８に於て、記憶
部５よりの特徴パラメータと照合される。パラメ
ータ格納部４より読出された特徴パラメータで記
憶部５よりの特徴パラメータに最も良く似た特徴
パラメータに該当する単音節音声の候補を第１と
し順次複数の候補を制御部１２は選定し、該候補
に相当する特徴パラメータをパラメータ格納部４
より乗算器９へ第１候補より入力する。同時に記
憶部５の特徴パラメータを乗算器１０に入力し、
周波数ウエイト記憶部６より、前記複数候補の組
合せにより定まる周波数ウエイト、即ち周波スペ
クトルにウエイトをかけ或る周波数帯域の成分を
強調し、その他の周波数帯域の成分は減少させた
ものを乗算器９，１０へ送る。又更に前記複数候
補の組合せにより閾値の変更を要するものは閾値
記憶部７より最適な閾値を再照合部１１へ送る。
再照合部１１は乗算器９，１０の出力と閾値記憶
部７よりの閾値により再照合する。第１候補より
順に複数の候補を再照合して最も似た特徴パラメ
ータに該当する単音節音声を認識結果として制御
部１２を経て出力より送出する。上記動作は話者
の単音節音声発声の都度繰り返され、その都度前
記再照合候補が選出され、同一単音節音声であつ
ても必ずしも再照合候補は同一組合せとはなら
ず、従つて再照合パラメータも再照合候補の組合
せごとに選定される。 First, the speaker connects the switching section 3 to the parameter storage section 4 under the control of the control section 12 in order to register monosyllabic speech in advance, and inputs the monosyllabic speech. The preprocessing section 1 performs audio level adjustment, analog-to-digital conversion, etc., and sends it to the parameter extraction section 2, which extracts characteristic parameters and stores them in the parameter storage section 4. Next, in order to perform voice recognition, the switching unit 3
is connected to the storage unit 5 and uttered. All the feature parameters stored in the parameter storage section 4 are read out under the control of the control section 12 and are compared with the feature parameters from the storage section 5 in the comparison section 8 . The control unit 12 sequentially selects a plurality of monosyllabic speech candidates corresponding to the feature parameters read from the parameter storage unit 4 that are most similar to the feature parameters from the storage unit 5 as the first candidate, and The feature parameters corresponding to the candidates are stored in the parameter storage unit 4.
The first candidate is then input to the multiplier 9. At the same time, the feature parameters in the storage unit 5 are input to the multiplier 10,
From the frequency weight storage unit 6, the frequency weight determined by the combination of the plurality of candidates, that is, the frequency spectrum is weighted to emphasize the components of a certain frequency band, and the components of other frequency bands are reduced, is applied to the multiplier 9, Send to 10. Furthermore, if the threshold value needs to be changed due to a combination of the plurality of candidates, the optimal threshold value is sent from the threshold storage section 7 to the re-verification section 11.
The re-verification unit 11 performs re-verification using the outputs of the multipliers 9 and 10 and the threshold value from the threshold value storage unit 7. A plurality of candidates are re-verified in order from the first candidate, and the monosyllabic speech corresponding to the most similar feature parameters is output as a recognition result via the control unit 12. The above operation is repeated each time the speaker utters a monosyllabic voice, and the re-matching candidates are selected each time.Even if the same monosyllabic voice is used, the re-matching candidates do not necessarily have the same combination, so the re-matching parameters are also selected for each combination of re-verification candidates.

本実施例に於て、サの如く子音が／ｓ／で始ま
る単音節音声とタの如き子音が／ｔ／で始まる単
音節音声の相互間を判別する場合、その判別誤り
を少くするには、該単音節音声の周波数スペクト
ル情報より約250Hz〜600Hzと3400Hz〜4500Hzの範
囲の成分を強調し、他の周波数成分は減少させ、
且つ音声始端検出レベルの閾値を低くすることに
よりすぐれた効果を得た。又バとパの如く子音
が／ｂ／と／ｐ／で始まる単音節音声、ザとサの
如く子音が／ｚ／と／ｓ／で始まる単音節音声は
約250Hz〜1300Hzの範囲の成分を強調することで、
ダとザの如く／ｄ／、／ｚ／で始まる単音節音声
は音声始端検出レベルの閾値を高くし音声始端の
雑音を除くことにより上記同様にすぐれた効果を
得た。 In this example, when discriminating between monosyllabic sounds such as sa where the consonant starts with /s/ and monosyllabic sounds such as ta where the consonant starts with /t/, how to reduce the discrimination error. , emphasizes components in the range of approximately 250Hz to 600Hz and 3400Hz to 4500Hz from frequency spectrum information of the monosyllabic speech, and reduces other frequency components,
In addition, excellent effects were obtained by lowering the threshold of the voice start detection level. In addition, monosyllabic speech whose consonants begin with /b/ and /p/, such as ba and pa, and monosyllabic speech whose consonants begin with /z/ and /s/, such as za and sa, have components in the range of approximately 250Hz to 1300Hz. By emphasizing
For monosyllabic speech starting with /d/ and /z/, such as da and za, the same excellent effect as above was obtained by increasing the threshold of the speech start detection level and removing the noise at the speech start.

(f) 発明の効果以上説明した如く本発明は各単音節音声固有の
特徴パラメータが有する周波数スペクトルの特徴
を抽出して記憶させておき話者の単音節音声の特
徴パラメータを該周波数スペクトルの特徴により
再照合することで認識誤りを大幅に低下させるこ
とが出来るため、その効果は大なるものがある。
又照合の方式は制御部によるDP照合のため照合
部と再照合部は同じものでよく装置の構成も容易
である。(f) Effects of the Invention As explained above, the present invention extracts and stores the frequency spectrum features of the characteristic parameters unique to each monosyllabic voice, and extracts and stores the frequency spectrum characteristics of the characteristic parameters of the monosyllabic voice of the speaker. Re-verification can greatly reduce recognition errors, which has a great effect.
Furthermore, since the verification method is DP verification by the control section, the verification section and re-verification section are the same, and the configuration of the device is easy.

[Brief explanation of drawings]

図は本発明の一実施例を示す回路のブロツク図
である。１は前処理部、２はパラメータ抽出部、３は切
替部、４はパラメータ格納部、５は記憶部、６は
周波数ウエイト記憶部、７は閾値記憶部、８は照
合部、９，１０は乗算器、１１は再照合部、１２
は制御部である。 The figure is a block diagram of a circuit showing one embodiment of the present invention. 1 is a preprocessing section, 2 is a parameter extraction section, 3 is a switching section, 4 is a parameter storage section, 5 is a storage section, 6 is a frequency weight storage section, 7 is a threshold storage section, 8 is a collation section, 9 and 10 are Multiplier, 11 is re-verification unit, 12
is the control section.

Claims

[Claims] 1. In a speech recognition device that recognizes monosyllabic speech by comparing feature parameters of monosyllabic speech registered in advance with feature parameters of unknown input monosyllabic speech, all registered monosyllabic speech After matching the syllable speech with the unknown input monosyllabic speech, a plurality of re-matching candidates are selected from the registered monosyllabic sounds based on the matching results, and a re-matching parameter is selected for each combination of the re-matching candidates. A monosyllabic speech recognition method characterized by providing a re-verification means to recognize monosyllabic speech. 2. Re-matching at least one of the feature parameters obtained by multiplying the frequency spectrum of the monosyllabic speech by the frequency weight and the feature parameters obtained by varying the threshold of the speech detection level for determining the matching interval. The monosyllabic speech recognition method according to claim 1, wherein the monosyllabic speech recognition method is selected as a parameter.