JPH0119599B2

JPH0119599B2 -

Info

Publication number: JPH0119599B2
Application number: JP57034713A
Authority: JP
Inventors: Yasuo Sato; Takayuki Ooyama; Takayuki Fujimoto; Tadayasu Sugita
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1982-03-05
Filing date: 1982-03-05
Publication date: 1989-04-12
Also published as: JPS58159600A

Description

【発明の詳細な説明】 (a) 発明の技術分野本発明は音声認識装置に係り、特に予め単音節
音声を音声認識装置に登録した話者の未知入力単
音節音声を認識させる単音節音声認識方式に関す
る。[Detailed Description of the Invention] (a) Technical Field of the Invention The present invention relates to a speech recognition device, and particularly to monosyllabic speech recognition that recognizes unknown input monosyllabic speech of a speaker whose monosyllabic speech has been registered in the speech recognition device in advance. Regarding the method.

(b) 技術の背景近年音声認識技術の向上に伴い、話者の音声を
認識する場合、認識誤りの少い音声認識装置の出
現が望まれている。音声認識方式は主として話者
の単音節音声を予め特徴パラメータに変換して記
憶させておき、未知入力単音節音声の特徴パラメ
ータと予め記憶させた特徴パラメータとを照合し
て最も似ているものを該当する単音節音声として
認識するものであるが、同じ単音節音声でも発声
の仕方では特徴パラメータは変化し、例え同一単
音節音声を何回か、発声方法を変えて登録してお
いても誤りを零にすることは困難である。特に認
識誤りを生じ易い特徴パラメータを有する単音節
音声は照合方法を考慮しないと認識率の向上を計
ることが出来ない。このため予め登録してある総
ての単音節音声と話者の単音節音声とを照合した
後、該照合結果に基づき未知入力単音節音声に最
も似ている単音節音声から順に順次複数の再照合
候補を登録済単音節音声より選出し、該複数の再
照合候補の組合せに応じて定まる再照合パラメー
タにより未知入力単音節音声と該複数の再照合候
補とを再照合して認識率の向上を計る単音節音声
認識方式が提案されている。しかし上記再照合方
式には改善の余地がありその対策が望まれてい
る。(b) Background of the Technology As speech recognition technology has improved in recent years, there has been a desire for a speech recognition device with fewer recognition errors when recognizing a speaker's speech. The speech recognition method mainly converts the speaker's monosyllabic speech into feature parameters and stores them in advance, and then compares the feature parameters of the unknown input monosyllabic speech with the pre-stored feature parameters to find the one that is most similar. It is recognized as a corresponding monosyllabic voice, but even if the same monosyllabic voice is pronounced, the characteristic parameters change depending on the way it is uttered, so even if you register the same monosyllabic voice several times with different utterance methods, errors will occur. It is difficult to reduce to zero. In particular, for monosyllabic speech that has characteristic parameters that are likely to cause recognition errors, it is impossible to improve the recognition rate unless the matching method is taken into consideration. For this reason, after comparing all monosyllabic voices registered in advance with the monosyllabic voice of the speaker, multiple replays are performed in order from the monosyllabic voice that is most similar to the unknown input monosyllabic voice based on the matching results. A matching candidate is selected from registered monosyllabic speech, and the unknown input monosyllabic speech is re-matched with the plurality of re-matching candidates using a re-matching parameter determined according to the combination of the plurality of re-matching candidates to improve the recognition rate. A monosyllabic speech recognition method has been proposed. However, there is room for improvement in the above reverification method, and countermeasures are desired.

(c) 発明の目的本発明の目的は上記要望に基づき上記再照合方
式の単音節音声認識方式に於て、再照合候補の数
を絞つて再照合に要する時間を短縮すると共に装
置の構成を簡易化し経済性の向上を計るものであ
る。(c) Purpose of the Invention Based on the above-mentioned needs, the purpose of the present invention is to narrow down the number of re-verification candidates, shorten the time required for re-verification, and improve the configuration of the device in the monosyllabic speech recognition method using the re-verification method. The aim is to simplify and improve economic efficiency.

(d) 発明の構成本発明の構成は予め単音節音声を登録してお
き、未知入力単音節音声の特徴パラメータと予め
登録された総ての単音節音声の特徴パラメータを
DP照合して最も良く似ているものから上位順に
順次複数の再照合候補を該登録済単音節音声より
選別し、該複数の再照合候補の組合せに応じて定
まる再照合パラメータにより未知入力単音節音声
と該再照合候補とを再照合して、その結果最も良
く似ている再照合候補を該当単音節音声として認
識するが、該複数の再照合候補を選別する際に
DP照合時の類似度が第一位の再照合候補か又は
類似度が上位の複数の再照合候補の組合せが予め
定められているものであつた場合は再照合工程を
省略し、前記DP照合時の類似度が第一位の再照
合候補を認識結果として送出し単音節音声認識時
間の短縮と再照合回路の簡易化を計るものであ
る。(d) Structure of the Invention The structure of the present invention is to register monosyllabic speech in advance, and to calculate the characteristic parameters of the unknown input monosyllabic speech and the characteristic parameters of all monosyllabic speech registered in advance.
After DP matching, multiple rematching candidates are selected from the registered monosyllabic speech in descending order of similarity, and unknown input monosyllables are determined by rematching parameters determined according to the combination of the multiple rematching candidates. The speech is re-matched with the re-matching candidates, and the re-matching candidate that is most similar as a result is recognized as the corresponding monosyllabic speech, but when selecting the multiple re-matching candidates,
If the re-matching candidate with the highest degree of similarity at the time of DP matching is the one with the highest degree of similarity, or the combination of multiple re-matching candidates with the highest degree of similarity is predetermined, the re-matching step is omitted and the DP matching is performed. The rematching candidate with the highest similarity in time is sent out as the recognition result, thereby shortening the monosyllabic speech recognition time and simplifying the rematching circuit.

(e) 発明の実施例図は本発明の一実施例を示す回路のブロツク図
である。先ず話者は予め単音節音声を登録するた
め制御部８の制御により切替部３をパラメータ格
納部４に接続し、単音節音声を入力より加える。
前処理部１は音声レベル調整及びアナログデイジ
タル変換等を行ないパラメータ抽出部２へ送出
し、パラメータ抽出部２は前記単音節音声の特徴
パラメータを抽出しパラメータ格納部４へ格納す
る。次に単音節音声の認識を行なわせるため、話
者は制御部８の制御により切替部３を記憶部５へ
接続し、単音節音声を発声する。前記同様の動作
により前処理部１、パラメータ抽出部２、切替部
３を経て記憶部５へ入つた未知入力単音節音声の
特徴パラメータは制御部８の制御によりパラメー
タ格納部４に格納されている全単音節音声の特徴
パラメータと照合部６に於てDP照合され、該全
単音節音声の特徴パラメータ中で最も良く似た特
徴パラメータを持つ単音節音声が第一位の再照合
候補として選出され、続いて順次複数の再照合候
補が選出され判定部７へ送られる。判定部７では
該第一位の再照合候補又は上位複数の再照合候補
の組合せを、テーブルに予め格納されている候補
と比較し、同一のものが存在した場合は再照合工
程を省略し制御部８を経て前記照合部６で第一位
の再照合候補に選出されたものを認識結果として
選出する。照合部６で選出された第一位の再照合
候補又は上位複数の再照合候補の組合せが予め定
められたもの以外は再照合して認識するため判定
部７より制御部８へ送出される。制御部８は該再
照合候補に相当する特徴パラメータをパラメータ
格納部４より乗算器１０へ、記憶部５に入つてい
る未知入力単音節音声の特徴パラメータを乗算器
１１へ夫々送出させ、判定部７は該再照合候補に
より定まる再照合パラメータ、即ち再照合候補を
相互に識別するに適した周波数帯域の成分を強調
し、その他の周波数帯域成分を減少させたものを
周波数ウエイト記憶部１２より乗算器１０，１１
へ送出させる。又判定部７は該再照合候補に応じ
て定まる最適の照合区間を決定するパラメータで
ある閾値を閾値記憶部１３より再照合部９へ送出
させる。再照合部９は乗算器１０，１１の出力と
該閾値記憶部１３よりの閾値とにより再照合す
る。前記第一候補より順に複数の再照合候補が未
知入力単音節音声と再照合され最も良く似た再照
合候補が認識結果として制御部８より出力へ送出
される。(e) Embodiment of the invention The figure is a block diagram of a circuit showing an embodiment of the invention. First, the speaker connects the switching section 3 to the parameter storage section 4 under the control of the control section 8 in order to register monosyllabic speech in advance, and inputs the monosyllabic speech.
The preprocessing section 1 performs audio level adjustment, analog-to-digital conversion, etc., and sends the result to the parameter extraction section 2. The parameter extraction section 2 extracts characteristic parameters of the monosyllabic speech and stores them in the parameter storage section 4. Next, in order to recognize the monosyllabic speech, the speaker connects the switching section 3 to the storage section 5 under the control of the control section 8, and utters the monosyllabic speech. The characteristic parameters of the unknown input monosyllabic speech that have entered the storage unit 5 via the preprocessing unit 1, parameter extraction unit 2, and switching unit 3 through the same operation as described above are stored in the parameter storage unit 4 under the control of the control unit 8. The feature parameters of all monosyllabic voices are compared with the DP in the matching unit 6, and the monosyllabic voice with the most similar feature parameters among the feature parameters of all monosyllabic voices is selected as the first candidate for re-matching. Then, a plurality of re-verification candidates are sequentially selected and sent to the determination unit 7. The determination unit 7 compares the first re-verification candidate or the combination of the top re-verification candidates with the candidates stored in advance in the table, and if the same candidates exist, the re-verification step is omitted and control is performed. After passing through section 8, the candidate selected as the first re-verification candidate by the collation section 6 is selected as a recognition result. If the first re-verification candidate selected by the collation unit 6 or a combination of the top re-verification candidates is other than a predetermined combination, it is sent from the determination unit 7 to the control unit 8 for re-verification and recognition. The control unit 8 causes the parameter storage unit 4 to send the feature parameters corresponding to the re-verification candidate to the multiplier 10 and the feature parameters of the unknown input monosyllabic speech stored in the storage unit 5 to the multiplier 11, respectively, and sends the feature parameters corresponding to the re-verification candidate to the multiplier 11, 7 multiplies from the frequency weight storage unit 12 a rematching parameter determined by the rematching candidate, that is, a parameter that emphasizes frequency band components suitable for mutually identifying rematching candidates and reduces other frequency band components. Vessels 10, 11
send to. Further, the determination unit 7 causes the threshold value storage unit 13 to send a threshold value, which is a parameter for determining the optimal matching interval determined according to the re-matching candidate, to the re-matching unit 9. The re-verification unit 9 performs re-verification using the outputs of the multipliers 10 and 11 and the threshold value from the threshold storage unit 13. A plurality of re-verification candidates are re-verified with the unknown input monosyllabic speech in order from the first candidate, and the most similar re-verification candidate is outputted from the control unit 8 as a recognition result.

本実施例に於て判定部７の予め定められたテー
ブルに格納されている再照合工程を省略する候補
の一例を述べると、照合部６に於ける認識率が極めて良いもの、
即ち例えば第一位の再照合候補が下記の如きも
のワの如く子音が／ｗ／で始まる単音節音声
ヤ、ユヨの如く子音が／ｊ／で始まる単音節音
声第一位と第二位の再照合候補の組合せとして
出現する可能性の少いもの、例えば下記の如き
もの。 In this embodiment, examples of candidates for omitting the re-verification process stored in the predetermined table of the determination unit 7 are: those with extremely high recognition rates in the verification unit 6;
That is, for example, the first rematching candidate is as follows: a monosyllabic voice with a consonant beginning with /w/, such as wa, and a monosyllabic voice with a consonant beginning with /j/, such as yuyo. Combinations that are unlikely to appear as re-verification candidate combinations, such as the following.

バとタの如く子音が／ｂ／と／ｔ／で始まる
単音節音声ダとカの如く子音が／ｄ／と／ｋ／で始まる
単音節音声マとハの如く子音が／ｍ／と／ｈ／で始まる単
音節音声ガとバの如く子音が／ｇ／と／ｐ／で始まる
単音節音声等である。 Monosyllabic consonants starting with /b/ and /t/ as in ba and ta Monosyllabic consonants starting with /d/ and /k/ as in da and ka Monosyllabic consonants starting with /d/ and /k/ as in ma and ha consonants /m/ and / Monosyllabic sounds starting with h/ These are monosyllabic sounds where the consonants begin with /g/ and /p/, such as ga and ba.

／ｂ／と／ｔ／、／ｄ／と／ｋ／、／ｍ／
と／ｈ／、／ｇ／と／ｐ／は相互に誤る可能性
が少く、該組合せとなる単音節音声は再照合を
行なう必要のない発声とみなして照合部６の結
果を用いるものである。 /b/ and /t/, /d/ and /k/, /m/
, /h/, /g/, and /p/ are less likely to be mistaken for each other, and the results of the matching unit 6 are used because the monosyllabic sounds that form these combinations are regarded as utterances that do not require re-verification. .

再照合の効果が小さいものラの如く子音が／ｒ／で始まる単音節音声／
ｒ／は不安定でバラツキが大きく他の種々の単
音節音声に誤る傾向があり再照合しても効果が
少く現状ではコストパフオーマンスが悪く再照
合を省略した方が有利である。 Monosyllabic sounds where the consonant starts with /r/, such as A, where the effect of rematching is small.
r/ is unstable and has large variations, and tends to be mistaken for various other monosyllabic sounds, so re-verification has little effect.Currently, cost performance is poor and it is more advantageous to omit re-verification.

(f) 発明の効果以上説明した如く本発明は再照合方式を用いる
単音節音声認識方式に於て、再照合候補の数を絞
つて再照合に要する時間を短縮し、且つ再照合動
作に関連する構成機器を簡易化することが可能で
経済性を向上させることが出来るため、その効果
は大なるものがある。(f) Effects of the Invention As explained above, the present invention reduces the time required for re-verification by narrowing down the number of re-verification candidates in a monosyllabic speech recognition method using a re-verification method, and also improves the performance related to the re-verification operation. This has a great effect because it is possible to simplify the component equipment used in the process and improve economic efficiency.

[Brief explanation of drawings]

図は本発明の一実施例を示す回路のブロツク図
である。１は前処理部、２はパラメータ抽出部、
３は切替部、４はパラメータ格納部、５は記憶
部、６は照合部、７は判定部、８は制御部、９は
再照合部、１０，１１は乗算器、１２は周波数ウ
エイト記憶部、１３は閾値記憶部である。 The figure is a block diagram of a circuit showing one embodiment of the present invention. 1 is a preprocessing unit, 2 is a parameter extraction unit,
3 is a switching unit, 4 is a parameter storage unit, 5 is a storage unit, 6 is a verification unit, 7 is a determination unit, 8 is a control unit, 9 is a re-verification unit, 10 and 11 are multipliers, 12 is a frequency weight storage unit , 13 is a threshold storage unit.

Claims

[Claims]

1. After comparing all previously registered monosyllabic speech with the unknown input monosyllabic speech, multiple re-matching candidates are selected from the registered monosyllabic speech based on the matching results, and a combination of the re-matching candidates is performed. In a speech recognition device that selects rematching parameters for each case and rematches unknown input monosyllabic speech, when selecting the rematching candidates, the rematching candidate with the highest degree of similarity is determined in advance. 1. A monosyllabic speech recognition method characterized in that re-verification is omitted when a combination of re-verification candidates with high similarity is determined in advance.