JP3040430B2

JP3040430B2 - Voice recognition device

Info

Publication number: JP3040430B2
Application number: JP2136859A
Authority: JP
Inventors: 知弘岩▲崎▼
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 1990-05-25
Filing date: 1990-05-25
Publication date: 2000-05-15
Anticipated expiration: 2015-05-15
Also published as: JPH0430199A

Description

【発明の詳細な説明】〔産業上の利用分野〕この発明は、音声信号の発生内容を記述した記号の系
列に従い、この記号に対応する様々な音声信号の部分区
間を代表する複数の標準パターンに対する類似度を用い
て、音声信号の認識を行う音声認識装置に関するもので
ある。DETAILED DESCRIPTION OF THE INVENTION [Industrial Application Field] The present invention relates to a plurality of standard patterns representing various sub-sections of an audio signal corresponding to the symbol in accordance with a series of symbols describing the generation content of the audio signal. The present invention relates to a speech recognition device for recognizing a speech signal using a similarity to the speech recognition device.

[Conventional technology]

第２図は例えば古井貞煕「ディジタル音声処理」（東
海大学出版会発行,P167〜P169）に示された従来の音声
認識装置を示すブロック図であり、図において、21は音
声信号を音響分析し、特徴パラメータの時系列に変換す
る音響分析回路、22は特定の記号に対応する様々な音声
信号の部分区間を代表する複数の標準パターンを記憶す
る標準パターンテーブル、23は局所照合回路で、これが
音響分析回路21より出力される特徴パラメータの時系列
のうち、互いに時間軸上で排他的に選択した部分区間と
上記標準パターンテーブル22に予め記憶されている標準
パターンとの照合を行い、この特徴パラメータの時系列
の部分区間とその標準パターンとの類似度である局所類
似度を出力する。24は認識の対象とする音声信号の発声
内容を記述した記号系列を記憶する記号系列辞書、25は
全体照合回路で、これが局所照合回路23より出力される
局所類似度を用いて、記号系列辞書24に記憶されている
記号系列との照合を行い、音声信号の記号系列に対する
類似度である全体類似度を出力する。26は全体照合回路
25より出力される全体類似度の最も大きな記号系列を認
識結果として出力する認識結果決定回路、27は音声信
号、28は特徴パラメータ、29は局所類似度、30は全体類
似度、31は認識結果、32は標準パターン、33は記号系列
である。FIG. 2 is a block diagram showing a conventional speech recognition apparatus shown in, for example, "Digital Speech Processing" by Sadahiro Furui (published by Tokai University Press, p. 167 to p. 169). An acoustic analysis circuit for converting the characteristic parameters into a time series, a standard pattern table 22 for storing a plurality of standard patterns representing partial sections of various audio signals corresponding to specific symbols, a local matching circuit 23, This is performed in the time series of the characteristic parameters output from the acoustic analysis circuit 21 to perform a comparison between the partial sections exclusively selected on the time axis and the standard patterns stored in advance in the standard pattern table 22. The local similarity, which is the similarity between the time-series partial section of the feature parameter and its standard pattern, is output. Reference numeral 24 denotes a symbol sequence dictionary that stores a symbol sequence describing the utterance content of a speech signal to be recognized. Reference numeral 25 denotes an overall matching circuit, which uses a local similarity output from the local matching circuit 23 to generate a symbol sequence dictionary. The collation with the symbol sequence stored in 24 is performed, and the overall similarity which is the similarity to the symbol sequence of the audio signal is output. 26 is the overall matching circuit
A recognition result determination circuit that outputs a symbol sequence having the largest overall similarity output from 25 as a recognition result, 27 is a speech signal, 28 is a feature parameter, 29 is local similarity, 30 is overall similarity, and 31 is a recognition result. , 32 are standard patterns, and 33 is a symbol series.

次に動作について説明する。 Next, the operation will be described.

音声認識においては、認識対象の音声の多様性を吸収
するに十分な数の複数の標準パターン（以後、マルチテ
ンプレートと呼ぶ）を用いることにより、局所類似度の
高い標準パターンを得ることが容易となる。特に、不特
定話者を対象とした音声認識では、マルチテンプレート
が用いられる場合が多い。In speech recognition, it is easy to obtain a standard pattern having a high local similarity by using a plurality of standard patterns (hereinafter, referred to as multi-templates) of a sufficient number to absorb the diversity of the speech to be recognized. Become. In particular, multi-templates are often used in speech recognition for unspecified speakers.

ここでは、音声の発声内容の記述単位を音素とし、認
識の対象を単語とし、以下、単語「asi」と単語「akin
を認識する場合を例として、従来の音声認識装置の説明
を行う。音素の種類を音素a,s,k,iの四つとして、それ
ぞれが二個のマルチテンプレートにより表されているも
のとする。音素ａの標準パターン32をPa（１）,Pa
（２）と呼び、音素ｓの標準パターン32をPs（１）,Ps
（２）と呼び、音素ｋの標準パターン32をPk（１）,Pk
（２）と呼び、音素ｉの標準パターン32をPi（１）,Pi
（２）と呼ぶ。記号系列辞書24には単語「asi」の記号
系列33と、単語「aki」の記号系列33が格納されている
ものとする。Here, the unit of description of the speech utterance content is a phoneme, and the recognition target is a word. Hereinafter, the word “asi” and the word “akin
A description will be given of a conventional speech recognition apparatus by taking the case of recognizing as an example. It is assumed that there are four phonemes, phonemes a, s, k, and i, each of which is represented by two multi-templates. The standard pattern 32 of phoneme a is Pa (1), Pa
(2), and the standard pattern 32 of the phoneme s is Ps (1), Ps
(2), the standard pattern 32 of phoneme k is Pk (1), Pk
(2), and the standard pattern 32 of phoneme i is Pi (1), Pi
Called (2). It is assumed that the symbol sequence dictionary 24 stores a symbol sequence 33 of the word “asi” and a symbol sequence 33 of the word “aki”.

まず、音響分析回路21にて音声信号27が音響分析さ
れ、特徴パラメータ28の時系列に変換される。今、局所
照合回路23において、特徴パラメータ28の時系列が、第
３図に示すようにL1,L2,L3の部分に分割されたとする。
この局所照合回路23では、各部分区間に対し各標準パタ
ーン32との照合を行い、局所類似度29を出力する。ここ
では部分区間L1に対する標準パターン32のPa（１）,Pa
（２）の局所類似度29をそれぞれDa（１）_L1,Da（２）
_L1とし、部分区間L2に対する標準パターン32のPs
（１）,Ps（２）の局所類似度29をそれぞれDs（１）_L2,
Ds（２）_L2とし、部分区間L3に対する標準パターン32の
Pi（１）,Pi（２）の局所類似度29をそれぞれDi（１）
_L3、Di（２）_L3とする。First, a sound signal 27 is sound-analyzed by a sound analysis circuit 21 and converted into a time series of characteristic parameters 28. Now, it is assumed that the time series of the characteristic parameter 28 is divided into L1, L2, and L3 in the local matching circuit 23 as shown in FIG.
In the local matching circuit 23, each partial section is compared with each standard pattern 32, and a local similarity 29 is output. Here, Pa (1), Pa of the standard pattern 32 for the partial section L1
The local similarity 29 of (2) is Da (1) _L1 and Da (2), respectively.
_L1 and Ps of the standard pattern 32 for the subsection L2
The local similarity 29 of (1) and Ps (2) is Ds (1) _L2 ,
Ds (2) _L2 and the standard pattern 32 for the subsection L3
The local similarity 29 of Pi (1) and Pi (2) is set to Di (1), respectively.
_L3 , Di (2) _L3 .

全体照合回路25においては、記号系列辞書24中の記号
系列33に従い、局所類似度29から全体照合を行い、全体
類似度30を出力する。単語「asi」の全体類似度30のDas
iは、以下のように演算される。すなわち、 Dasi＝max（Da（１）_L1,Da（２）_L1） ×max（Ds（１）_L2,Ds（２）_L2） ×max（Di（１）_L3,Di（２）_L3）となる。ここで、max（Ｄ（１）,D（２））はＤ（１）
とＤ（２）のうち、値の大きな方を選択するという意味
である。同様にして、単語「aki」に対しても、部分区
間をM1,M2,M3（前述したL1,L2,L3に相当する）とし、部
分区間M2に対する標準パターン32のPk（１）,Pk（２）
の局所類似度29をそれぞれDk（１）_M2,Dk（２）_M2とし
た時、全体類似度30のDakiは以下のように演算される。
すなわち、 Daki＝max（Da（１）_M1,Da（２）_M1） ×max（Dk（１）_M2,Dk（２）_M2） ×max（Di（１）_M3,Di（２）_M3）となる。さらに、認識結果決定回路26では全体類似度30
のDasiとの値を比較し、値の大きな単語の記号系列を認
識結果31として出力する。このようにして、標準パター
ンをマルチテンプレートとすることにより、局所類似度
の高い標準パターンを得ることが容易となる。The overall matching circuit 25 performs overall matching from the local similarity 29 according to the symbol sequence 33 in the symbol sequence dictionary 24, and outputs an overall similarity 30. Das with overall similarity of 30 for the word "asi"
i is calculated as follows. That is, Dasi = max (Da (1) _L1 , Da (2) _L1 ) × max (Ds (1) _L2 , Ds (2) _L2 ) × max (Di (1) _L3 , Di (2) _L3 ) . Here, max (D (1), D (2)) is D (1)
And D (2). Similarly, for the word “aki”, the partial sections are set to M1, M2, and M3 (corresponding to L1, L2, and L3 described above), and Pk (1), Pk ( 2)
Each Dk local similarity 29 (1) _M2, when a Dk (2) _M2, Daki overall similarity 30 is calculated as follows.
That is, Daki = max (Da (1) _M1 , Da (2) _M1 ) × max (Dk (1) _M2 , Dk (2) _M2 ) × max (Di (1) _M3 , Di (2) _M3 ) . Further, the recognition result determination circuit 26 determines the overall similarity 30
And outputs a symbol sequence of a word having a large value as a recognition result 31. In this manner, by making the standard pattern a multi-template, it becomes easy to obtain a standard pattern having a high local similarity.

[Problems to be solved by the invention]

従来の音声認識装置は以上のように構成されているの
で、マルチテンプレートの数を増やすにつれ、逆に他の
音韻のマルチテンプレートの中に局所類似度の高いもの
が出現し、このため他の単語と誤って認識されてしまう
ことがあり、マルチテンプレートの数を増やしたことに
よる認識率改善の効果が失われてしまうなどの課題があ
った。Since the conventional speech recognition apparatus is configured as described above, as the number of multi-templates is increased, conversely, a multi-template with a higher local similarity appears in multi-templates of other phonemes. There is a problem that the effect of improving the recognition rate due to an increase in the number of multi-templates is lost.

この発明は上記のような課題を解消するためになされ
たもので、マルチテンプレートの中で特徴パラメータの
時系列の部分区間に対する最近傍の標準パターンを選択
し、互いに時間軸上で排他的な各部分区間で選択される
最近傍の標準パターンの組み合わせの確率を予め求めて
おき、認識時にはこの標準パターンの組み合わせ確率が
小さい時には、全体類似度により高いペナルティーを与
えることで、誤った記号系列との全体類似度を小さく
し、以って誤認識を最小限に抑えることができる音声認
識装置を得ることを目的とする。The present invention has been made to solve the above-described problem, and selects a nearest standard pattern for a time-series partial section of a feature parameter in a multi-template, and mutually exclusive each other on a time axis. The probability of the combination of the nearest standard pattern selected in the subsection is obtained in advance, and when the combination probability of this standard pattern is small at the time of recognition, a higher penalty is given to the overall similarity to allow for a wrong symbol sequence. It is an object of the present invention to obtain a speech recognition device that can reduce the overall similarity and thereby minimize erroneous recognition.

[Means for solving the problem]

この発明に係る音声認識装置は、音声信号を音響分析
して、特徴パラメータの時系列に変換する音響分析回路
と、特定の記号に対応する様々な音声信号の部分区間を
代表する複数の標準パターンを記憶する標準パターンテ
ーブルと、上記音響分析回路より出力される特徴パラメ
ータの時系列のうち、互いに時間軸上で排他的に選択し
た部分区間と、上記標準パターンテーブルに予め記憶さ
れている標準パターンとの照合を行い、上記特徴パラメ
ータ時系列の部分区間と上記標準パラメータとの局所類
似度を出力する局所照合回路と、認識の対象とする上記
音声信号の発声内容を記述した記号系列を記憶する記号
系列辞書と、上記局所照合回路より出力される局所類似
度を用いて、上記記号系列辞書に記憶されている記号系
列との照合を行い、上記音声信号の記号系列に対する類
似度尾である全体類似度を出力する全体照合回路と、上
記局所照合回路から出力される局所類似度に基づき、上
記局所照合回路にて用いられた特徴パラメータの時系列
の各部分区間に対する最近傍の標準パターンを上記記号
系列辞書の記号系列に従い各記号毎に選択し、それらの
標準パターンの番号の系列を最近傍標準パターン番号系
列として出力する最近傍標準パターン選択回路とを備
え、上記最近傍標準パターン番号系列に含まれる標準パ
ターンの組み合わさの確率を求めて標準パターン組み合
わせ確率テーブルに記憶させ、上記最近傍標準パターン
選択回路より出力される最近傍標準パターン番号系列に
対し、上記標準パターン組み合わせ確率テーブルを参照
してペナルティーの値を決定し、このペナルティーによ
り上記全体照合回路より出力される全体類似度の補正を
行い、補正後の全体類似度の最も大きな記号系列を認識
結果として認識結果決定回路から出力するようにしたも
のである。A speech recognition apparatus according to the present invention includes an acoustic analysis circuit for acoustically analyzing a speech signal and converting the speech signal into a time series of characteristic parameters, and a plurality of standard patterns representing various sections of the speech signal corresponding to a specific symbol. , A partial section exclusively selected on the time axis among the time series of the characteristic parameters output from the acoustic analysis circuit, and a standard pattern previously stored in the standard pattern table. And a local matching circuit that outputs a local similarity between the partial section of the feature parameter time series and the standard parameter, and stores a symbol sequence describing the utterance content of the speech signal to be recognized. Using the symbol sequence dictionary and the local similarity output from the local matching circuit, perform matching with the symbol sequence stored in the symbol sequence dictionary, An overall matching circuit that outputs an overall similarity that is a similarity tail to the symbol sequence of the speech signal; and a feature parameter used in the local matching circuit based on the local similarity output from the local matching circuit. Nearest standard pattern selection for selecting the nearest standard pattern for each subsection of the sequence for each symbol according to the symbol sequence of the above symbol sequence dictionary, and outputting the sequence of numbers of those standard patterns as the nearest standard pattern number sequence Circuit, a probability of combination of standard patterns included in the nearest standard pattern number sequence is determined and stored in a standard pattern combination probability table, and a nearest standard pattern number sequence output from the nearest standard pattern selection circuit is provided. The penalty value is determined with reference to the standard pattern combination probability table. It corrects the overall similarity output from the entire matching circuit by Faculty, in which to output the recognition result determination circuit the greatest symbol sequence of the entire similarity corrected as a recognition result.

[Action]

この発明における認識結果決定回路は、最近傍標準パ
ターン番号系列に対し、標準パターン組み合わせ確率テ
ーブルを参照して、確率の低い標準パターンの組み合わ
せに対して大きなペナルティーを与え、これにより誤っ
た記号系列との全体類似度を小さくするように機能す
る。The recognition result determination circuit according to the present invention refers to the standard pattern combination probability table for the nearest standard pattern number sequence and gives a large penalty to a combination of standard patterns with low probabilities, whereby an incorrect symbol sequence and The function works to reduce the overall similarity of.

(Example of the invention)

以下、この発明の一実施例を図について説明する。図
において、１は音声信号を音響分析して、特徴パラメー
タの時系列に変換する音響分析回路、２は特徴の記号に
対する様々な音声信号の部分区間を代表する複数の標準
パターンを記憶する標準パターンテーブル、３は局所照
合回路で、これが音響分析回路１より出力される特徴パ
ラメータの時系列のうち、互いに時間軸上で排他的に選
択した部分区間と上記標準パターンテーブル２に予め記
憶されている標準パターンとの照合を行い、この特徴パ
ラメータ時系列の部分区間と上記標準パターンとの類似
度である局所類似度を出力する。４は認識対象とする音
声信号の発声内容を記述した記号系列を記憶する記号系
列辞書、５は全体照合回路で、これが局所照合回路３よ
り出力される局所類似度を用いて、記号系列辞書４に記
憶されている記号系列との照合を行い、音声信号の記号
系列に対する類似度である全体類似度を出力する。７は
局所照合回路３から出力される局所類似度に基づき、局
所照合回路３にて用いられた特徴パラメータの時系列の
各部分区間に対する最近傍の標準パターンを、記号系列
辞書４の記号系列に従い各記号毎に選択し、それらの標
準パターンの番号の系列を、最近傍標準パターン番号系
列として出力する最近傍標準パターン選択回路、８は最
近傍標準パターン番号系列に含まれる標準パターンの組
み合わせの確率を予め求めて、記憶する標準パターン組
み合わせ確率テーブル、6Aは最近傍標準パターン選択回
路７より出力される最近傍標準パターン番号系列に対
し、標準パターン組み合わせ確率テーブル８を参照して
ペナルティーの値を決定し、このペナルティーにより全
体照合回路５より出力される全体類似度の補正を行い、
補正後の全体類似度の最も大きな記号系列を認識結果と
して出力する認識結果決定回路、９は音声信号、10は特
徴パラメータ、11は局所類似度、12は全体類似度、13は
認識結果、14は標準パターン、15は記号系列、16は最近
傍パターン番号系列、17は標準パターン組み合わせ確率
である。An embodiment of the present invention will be described below with reference to the drawings. In the drawing, reference numeral 1 denotes an acoustic analysis circuit for acoustically analyzing an audio signal and converting the audio signal into a time series of feature parameters. Reference numeral 2 denotes a standard pattern for storing a plurality of standard patterns representing various subsections of the audio signal for the feature symbol. Table 3 is a local matching circuit, which is stored in advance in the standard pattern table 2 and the partial sections exclusively selected on the time axis in the time series of the characteristic parameters output from the acoustic analysis circuit 1. Collation with a standard pattern is performed, and a local similarity, which is a similarity between the partial section of the feature parameter time series and the standard pattern, is output. Reference numeral 4 denotes a symbol sequence dictionary that stores a symbol sequence describing the utterance content of a speech signal to be recognized. Reference numeral 5 denotes an overall matching circuit, which uses a local similarity output from the local matching circuit 3 to generate a symbol sequence dictionary 4. And outputs the overall similarity, which is the similarity of the audio signal to the symbol sequence. Reference numeral 7 denotes a nearest standard pattern for each partial section of the time series of the feature parameters used in the local matching circuit 3 according to the symbol sequence of the symbol sequence dictionary 4 based on the local similarity output from the local matching circuit 3. A nearest standard pattern selecting circuit for selecting each symbol and outputting the standard pattern number sequence as the nearest standard pattern number sequence; and 8, a probability of a combination of standard patterns included in the nearest standard pattern number sequence. 6A is a standard pattern combination probability table to be stored in advance, and 6A determines a penalty value for the nearest standard pattern number sequence output from the nearest standard pattern selection circuit 7 with reference to the standard pattern combination probability table 8. Then, the overall similarity output from the overall matching circuit 5 is corrected by this penalty,
A recognition result determination circuit that outputs a symbol sequence having the highest overall similarity as a recognition result as a recognition result, 9 is a speech signal, 10 is a feature parameter, 11 is local similarity, 12 is overall similarity, 13 is a recognition result, 14 Is a standard pattern, 15 is a symbol sequence, 16 is a nearest neighbor pattern number sequence, and 17 is a standard pattern combination probability.

次に動作について説明する。 Next, the operation will be described.

ここでは従来例の説明と同じく、単語「asi」と単語
「aki」を識別する場合について説明する。音韻の種類
を音素a,s,k,iの四つとし、それぞれが二個のマルチテ
ンプレートにより表されているものとする。音素ａの標
準パターン14をPa（１）,Pa（２）と呼び音素ｓの標準
パターン14をPs（１）,Ps（２）と呼び、音素ｋの標準
パターン14をPk（１）,Pk（２）と呼び、音素ｉの標準
パターン14をPi（１）,Pi（２）と呼ぶ。記号系列辞書
４には単語「asi」の記号系列15と、単語「aki」の記号
系列15が格納されている。また、音響分析回路1,標準パ
ターンテーブル2,局所照合回路3,記号系列辞書4,全体照
合回路５の機能は従来の音声認識装置と同様である。Here, as in the description of the conventional example, the case where the word “asi” and the word “aki” are identified will be described. It is assumed that there are four phonemes, phonemes a, s, k, and i, each of which is represented by two multi-templates. The standard pattern 14 of the phoneme a is called Pa (1), Pa (2), the standard pattern 14 of the phoneme s is called Ps (1), Ps (2), and the standard pattern 14 of the phoneme k is Pk (1), Pk. (2), and the standard pattern 14 of the phoneme i is called Pi (1) and Pi (2). The symbol sequence dictionary 4 stores a symbol sequence 15 of the word “asi” and a symbol sequence 15 of the word “aki”. The functions of the acoustic analysis circuit 1, the standard pattern table 2, the local matching circuit 3, the symbol sequence dictionary 4, and the overall matching circuit 5 are the same as those of the conventional speech recognition device.

この発明においても、特徴パラメータ10の時系列の部
分区間がL1,L2,L3に分割され、局所照合回路３より部分
区間に対する標準パターン14の局所類似度11のDa（１）
_L1,Da（２）_L1,Ds（１）_L2,Ds（２）_L2,Di（１）_L3,Da
（２）_L3が出力され、全体照合回路５にて単語「asi」
の全体類似度12のDasiが演算される。Also in the present invention, the time-series partial section of the feature parameter 10 is divided into L1, L2, and L3, and the local matching circuit 3 outputs the local similarity 11 of the standard pattern 14 to the partial section by Da (1).
_L1 , Da (2) _L1 , Ds (1) _L2 , Ds (2) _L2 , Di (1) _L3 , Da
(2) _L3 is output, and the overall matching circuit 5 outputs the word “asi”
Dasi of the overall similarity of 12 is calculated.

同様に、特徴パラメータ10の時系列の部分区間をM1,M
2,M3（前述したL1,L2,L3に相当）に対し、局所照合回路
３より部分区間に対する標準パターン14の局所類似度11
のDa（１）_M1,Da（２）_M1,Dk（１）_M2,Dk（２）_M2,Di
（１）_M3,Di（２）_M3が出力され、全体照合回路５にて
単語「aki」の全体類似度12のDakiが演算される。Similarly, the time-series partial intervals of the feature parameter 10 are M1, M
For 2,2 (corresponding to L1, L2, L3 described above), local similarity 11
Da (1) _M1 , Da (2) _M1 , Dk (1) _M2 , Dk (2) _M2 , Di
(1) _M3 , Di (2) _M3 is output, and the overall matching circuit 5 calculates Daki of the overall similarity 12 of the word “aki”.

一方、最近傍標準パターン選択回路７では、局所照合
回路３より出力される局所類似度11に基づき、部分区間
L1,L2,L3またはM1,M2,M3に対する各音韻毎の最近傍の標
準パターンを選択し、その標準パターンの番号を最近傍
標準パターン番号系列として出力する。On the other hand, in the nearest standard pattern selection circuit 7, based on the local similarity 11 output from the local matching circuit 3,
The nearest standard pattern for each phoneme for L1, L2, L3 or M1, M2, M3 is selected, and the number of the standard pattern is output as the nearest standard pattern number sequence.

ここで、局所類似度11の各値が、 Da（１）_L1＞Da（２）_L1 Ds（１）_L2＞Ds（２）_L2 Di（１）_L3＞Di（２）_L3 Da（１）_M1＞Da（２）_M1 Dk（１）_M2＜Dk（２）_M2 Di（１）_M3＞Di（２）_M3 であったとすると、最近傍標準パターン番号系列16とし
ては単語「asi」に対して、Pa（１）,Ps（１）,Pi
（１）が単語「aki」に対してPa（１）,Pk（２）,Pi
（１）がそれぞれ出力される。Here, each value of the local similarity 11 is Da (1) _L1 > Da (2) _L1 Ds (1) _L2 > Ds (2) _L2 Di (1) _L3 > Di (2) _L3 Da (1) _M1 > Da (2) _M1 Dk (1) _M2 <Dk (2) _M2 Di (1) _M3 > Di (2) _M3 > Di (2) _M3 Pa (1), Ps (1), Pi
(1) is Pa (1), Pk (2), Pi for the word "aki"
(1) is output.

認識結果決定回路6Aでは、これらの最近傍標準パター
ン番号系列16に対し、標準パターン組み合わせ確率テー
ブル８を参照してペナルティーの値を決定する。標準パ
ターンPa（１）,Ps（１）,Pi（１）の組み合わせに対す
るペナルティーの値をＣ（ａ（１）,s（１）,i
（１））、標準パターンPa（１）,Pk（２）,Pi（１）の
組み合わせに対するペナルティーの値をＣ（ａ（１）,k
（２）,i（１））とすると、認識結果決定回路6Aではこ
のペナルティーの値を用いて、全体照合回路５から出力
される全体類似度12のDasi,Dakiを補正し、補正後の全
体類似度DAasi,DAakiを求め、大きい方の値の単語の記
号系列を認識結果として出力する。補正後の全体類似度
DAasi,DAakiはそれぞれ DAasi＝Dasi÷Ｃ（ａ（１）,s（１）,i（１）） DAaki＝Daki÷Ｃ（ａ（１）,k（２）,i（１））と計算される。ここで、ペナルティーの値としては、標
準パターン組み合わせ確率が低いほど、より大きなペナ
ルティーを与えるものとする。標準パターン組み合わせ
確率は正しい単語を発声した音声信号をこの音声認識装
置に与えたときの、最近傍標準パターン選択回路７から
出力される最近傍標準パターン番号系列16より学習す
る。このため標準パターン組み合わせ確率は正しい単語
の記号系列では高い値となり、逆に誤った単語の記号系
列では低い値となる。そこで、今音声認識装置に入力さ
れた音声の発声が「asi」であったとすると通常は、標
準パターンPa（１）,Ps（１）,Pi（１）の組み合わせの
確率の方が標準パターンPa（１）,Pk（２）,Pi（１）の
組み合わせの確率よりも大きな値となる。そこでペナル
ティーの値としては、Ｃ（ａ（１）,s（１）,i（１））
＜Ｃ（ａ（１）,k（２）,i（１））となり、誤った単語
「aki」の全体類似度の方により大きなペナルティーを
与えることができる。The recognition result determination circuit 6A determines a penalty value for the nearest standard pattern number sequence 16 with reference to the standard pattern combination probability table 8. The penalty value for the combination of the standard patterns Pa (1), Ps (1), and Pi (1) is represented by C (a (1), s (1), i
(1)) and the penalty value for the combination of the standard patterns Pa (1), Pk (2) and Pi (1) is represented by C (a (1), k
(2), i (1)), the recognition result determination circuit 6A uses this penalty value to correct Dasi and Daki of the overall similarity 12 output from the overall matching circuit 5, and The similarities DAasi and DAaki are obtained, and a symbol sequence of a word having a larger value is output as a recognition result. Corrected overall similarity
DAasi and DAaki are calculated as DAasi = Dasi ÷ C (a (1), s (1), i (1)) DAaki = Daki ÷ C (a (1), k (2), i (1)) You. Here, as the penalty value, the lower the standard pattern combination probability, the greater the penalty. The standard pattern combination probability is learned from the nearest standard pattern number sequence 16 output from the nearest standard pattern selection circuit 7 when a speech signal uttering a correct word is given to this speech recognition device. Therefore, the standard pattern combination probability has a high value in a symbol sequence of a correct word, and has a low value in a symbol sequence of an incorrect word. Therefore, if the utterance of the voice input to the voice recognition apparatus is “asi”, the probability of the combination of the standard patterns Pa (1), Ps (1), and Pi (1) is usually larger than the standard pattern Pa The value is larger than the probability of the combination of (1), Pk (2), and Pi (1). Therefore, the value of the penalty is C (a (1), s (1), i (1))
<C (a (1), k (2), i (1)), and a greater penalty can be given to the overall similarity of the erroneous word “aki”.

なお、上記実施例では音声の発声内容の記述単位が音
素の場合について説明してきたが、これは音節，半音
素，音素片などであってもよく、上記実施例と同様の効
果を奏する。In the above-described embodiment, the case where the description unit of the utterance content of the speech is a phoneme has been described, but this may be a syllable, a half-phoneme, a phoneme fragment, or the like, and the same effect as in the above-described embodiment is exerted.

また、上記実施例では認識の対象を単語として説明し
てきたが、これは音節，文節，文章であってもよく、上
記実施例と同様の効果を奏する。Further, in the above-described embodiment, the recognition target is described as a word. However, the recognition target may be a syllable, a phrase, or a sentence, and has the same effect as the above-described embodiment.

また、認識結果決定回路6Aにおける補正後の全体類似
度を局所照合回路３にフィードバックすることにより、
動的計画法を用いて、局所照合回路３における特徴パラ
メータ時系列の部分区間が、認識結果決定回路6Aにおい
て、最も補正後の全体類似度が高くなるように選択する
構成とすることも可能である。Also, by feeding back the overall similarity after correction in the recognition result determination circuit 6A to the local matching circuit 3,
It is also possible to adopt a configuration in which the partial section of the feature parameter time series in the local matching circuit 3 is selected in the recognition result determination circuit 6A so as to have the highest overall similarity after correction using dynamic programming. is there.

〔The invention's effect〕

以上のように、この発明によれば局所照合回路から出
力される局所類似度に基づき、その局所照合回路にて用
いられた特徴パラメータの時系列の各部分区間に対する
最近傍の標準パターンを、最近傍標準パターン選択回路
により記号系列辞書の記号系列に従い各記号毎に選択
し、それらの標準パターンの番号の系列を最近傍標準パ
ターン番号系列として出力し、最近傍標準パターン番号
系列に含まれる標準パターンの組み合わせの確率を予め
求めて標準パターン組み合わせ確率テーブルに記憶して
おき、その最近傍標準パターン番号系列に対し、前記標
準パターン組み合わせ確率テーブルを参照してペナルテ
ィーの値を決定し、このペナルティーにより全体照合回
路より出力される全体類似度の補正を行い、補正後の全
体類似度の最も大きな記号系列を認識結果として認識結
果決定回路より出力するように構成したので、誤った記
号系列との全体類似度を小さくでき、誤りの少ない音声
認識を行えるものが得られる効果がある。As described above, according to the present invention, based on the local similarity output from the local matching circuit, the nearest standard pattern for each partial section of the time series of the characteristic parameter used in the local matching circuit is The sub-standard pattern selection circuit selects each symbol in accordance with the symbol sequence of the symbol sequence dictionary for each symbol, outputs a sequence of the numbers of those standard patterns as the nearest standard pattern number sequence, and outputs the standard pattern included in the nearest standard pattern number sequence. Is determined in advance and stored in the standard pattern combination probability table, and for the nearest standard pattern number sequence, a penalty value is determined with reference to the standard pattern combination probability table, and the overall penalty is determined by this penalty. The overall similarity output from the matching circuit is corrected, and the corrected overall similarity having the largest Since the issue sequence recognized result determination circuit configured to output from the recognition result, erroneous can reduce the overall similarity between the symbol sequence, the effect of those capable of performing fewer speech recognition erroneous obtained.

[Brief description of the drawings]

第１図はこの発明の一実施例による音声認識装置を示す
ブロック図、第２図は従来の音声認識装置を示すブロッ
ク図、第３図は本発明および従来の音声認識装置の動作
を説明するために用いる特徴パラメータの時系列を示す
説明図である。１は音響分析回路、２は標準パターンテーブル、３は局
所照合回路、４は記号系列辞書、５は全体照合回路、７
は最近傍標準パターン選択回路、８は標準パターン組み
合わせ確率テーブル、6Aは認識結果決定回路。なお、図中、同一符号は同一、または相当部分を示す。FIG. 1 is a block diagram showing a speech recognition apparatus according to an embodiment of the present invention, FIG. 2 is a block diagram showing a conventional speech recognition apparatus, and FIG. 3 explains the operation of the present invention and the conventional speech recognition apparatus. FIG. 4 is an explanatory diagram showing a time series of feature parameters used for this. 1 is an acoustic analysis circuit, 2 is a standard pattern table, 3 is a local matching circuit, 4 is a symbol sequence dictionary, 5 is an entire matching circuit, 7
Is a nearest standard pattern selection circuit, 8 is a standard pattern combination probability table, and 6A is a recognition result determination circuit. In the drawings, the same reference numerals indicate the same or corresponding parts.

───────────────────────────────────────────────────── フロントページの続き (56)参考文献特開昭60−140484（ＪＰ，Ａ) 特開昭60−220399（ＪＰ，Ａ) 特開昭62−83796（ＪＰ，Ａ) 特開昭63−208096（ＪＰ，Ａ) 特開昭61−213900（ＪＰ，Ａ) 特許2529207（ＪＰ，Ｂ２) 特公平５−52506（ＪＰ，Ｂ２) 特公平５−52507（ＪＰ，Ｂ２) 特公平５−34679（ＪＰ，Ｂ２) 特公平６−64076（ＪＰ，Ｂ２) 特公平６−30095（ＪＰ，Ｂ２) 特公平２−36960（ＪＰ，Ｂ２) 特公平５−85917（ＪＰ，Ｂ２) (58)調査した分野(Int.Cl.⁷，ＤＢ名) G10L 15/10 G10L 15/20 ＪＩＣＳＴファイル（ＪＯＩＳ)──────────────────────────────────────────────────続き Continuation of the front page (56) References JP-A-60-140484 (JP, A) JP-A-60-220399 (JP, A) JP-A-62-83796 (JP, A) JP-A-63-83796 208096 (JP, A) JP-A-61-213900 (JP, A) Patent 2529207 (JP, B2) JP 5-52506 (JP, B2) JP 5-52507 (JP, B2) JP 5 -34679 (JP, B2) JP 6-40076 (JP, B2) JP 6-30095 (JP, B2) JP 2 36960 (JP, B2) JP 5 85917 (JP, B2) (58) Fields surveyed (Int.Cl. ⁷ , DB name) G10L 15/10 G10L 15/20 JICST file (JOIS)

Claims

(57) [Claims]

An acoustic analysis circuit for acoustically analyzing an audio signal and converting it into a time series of characteristic parameters, and a standard for storing a plurality of standard patterns representing various sections of various audio signals corresponding to specific symbols. Of the pattern table and the time series of the characteristic parameters output from the acoustic analysis circuit,
A partial section selected exclusively on the time axis is compared with a standard pattern stored in advance in the standard pattern table, and a local similarity between the partial section of the feature parameter time series and the standard pattern is output. A local sequence matching circuit, a symbol sequence dictionary that stores a symbol sequence describing the utterance content of the speech signal to be recognized, and a local similarity output from the local matching circuit. A local matching circuit that performs matching with a stored symbol sequence and outputs an overall similarity that is a similarity to the symbol sequence of the audio signal; and a local similarity output from the local matching circuit. The nearest standard pattern for each partial section of the time series of the feature parameter used in the matching circuit is selected for each symbol according to the symbol sequence of the symbol sequence dictionary, A nearest neighbor standard pattern selection circuit that outputs a sequence of these standard pattern numbers as the nearest neighbor standard pattern number sequence, and a probability of a combination of standard patterns included in the nearest neighbor standard pattern number sequence are obtained and stored in advance. A standard pattern combination probability table,
A penalty value is determined for the nearest standard pattern number sequence output from the nearest standard pattern selection circuit with reference to the standard pattern combination probability table, and the overall similarity output from the overall matching circuit is determined by this penalty. And a recognition result determination circuit that corrects the degree and outputs a symbol sequence having the largest overall similarity after the correction as a recognition result.