JP2856429B2

JP2856429B2 - Voice recognition method

Info

Publication number: JP2856429B2
Application number: JP1123612A
Authority: JP
Inventors: 康之正井
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1989-05-17
Filing date: 1989-05-17
Publication date: 1999-02-10
Anticipated expiration: 2014-02-10
Also published as: JPH02302799A

Description

【発明の詳細な説明】［発明の目的］（産業上の利用分野）本発明は、音声認識装置に用いられる音声認識方式に
係り、特に認識対象外音声が入力されたときにリジェク
トする技術を改良した音声認識方式に関する。DETAILED DESCRIPTION OF THE INVENTION [Object of the Invention] (Industrial application field) The present invention relates to a speech recognition method used in a speech recognition device, and particularly to a technique for rejecting a speech that is not to be recognized when it is input. The present invention relates to an improved speech recognition system.

（従来の技術）音声による情報の入出力は人間にとって自然性が高
く、マン・マシン・インタフェイスとして優れており、
従来から種々研究されている。(Prior art) The input and output of information by voice is highly natural for humans and is excellent as a man-machine interface.
Conventionally, various studies have been made.

このようなことを目的とした音声認識方式として、あ
らかじめ収集された学習音声パターンに対して平滑処理
や微分処理を施して作成した標準パターンと、入力され
た音声を分析処理して求められる入力音声パターンの平
滑パターンとの間で類似度を計算することにより、入力
された音声を認識するものがある（特願昭62−252108号
参照）。As a speech recognition method for this purpose, a standard pattern created by performing smoothing or differentiation processing on a learning speech pattern collected in advance, and an input speech obtained by analyzing the input speech There is one that recognizes an input voice by calculating a similarity between a pattern and a smooth pattern (see Japanese Patent Application No. 62-252108).

また、このような音声認識において、入力された音声
が認識対象音声であるか否かの判定は、上記標準パター
ンと入力音声パターンの平滑パターンとの間で計算した
類似度を用いて行なっていた。In such speech recognition, whether or not the inputted speech is the speech to be recognized has been determined using the similarity calculated between the standard pattern and the smoothed pattern of the input speech pattern. .

（発明が解決しようとする課題）上記したように、認識対象音声であるか否かの判定
は、標準パターンと入力音声パターンの平滑パターンと
の間で計算した類似度を用いて行なっているため、認識
対象外音声を入力した場合でも上記類似度が高くなり、
誤認識するという問題があった。(Problems to be Solved by the Invention) As described above, the determination as to whether or not the voice is the recognition target voice is performed using the similarity calculated between the standard pattern and the smooth pattern of the input voice pattern. , The similarity will be higher even if the non-recognized speech is input,
There was a problem of misrecognition.

ここで、そのことについて第３図を用いて詳細に説明
する。たとえば学習音声パターンが第３図（ａ）に示す
ようなパターンで、認識対象外の入力音声パターンの平
滑パターンが第３図（ｂ）に示すようなパターンであっ
たとすると、これら両パターン間で計算される類似度は
第３図（ｃ）に示すようになり、高いレベルの類似度が
得られてしまう。このため、誤認識が生じ、認識対象外
音声が入力されても、これをリジェクト（拒否）するこ
とが不可能であった。Here, this will be described in detail with reference to FIG. For example, if the learning voice pattern is a pattern as shown in FIG. 3A and the smooth pattern of the input voice pattern not to be recognized is a pattern as shown in FIG. The calculated similarity is as shown in FIG. 3 (c), and a high level of similarity is obtained. For this reason, erroneous recognition occurs, and it is impossible to reject (reject) even if a voice not to be recognized is input.

そこで、本発明は、認識対象外音声が入力された場合
には、これを誤認識せずに高い精度で拒否することが可
能となる音声認識方式を提供することを目的とする。Accordingly, it is an object of the present invention to provide a speech recognition method that enables rejection of a non-recognized speech with high accuracy without erroneous recognition.

［発明の構成］（課題を解決するための手段）本発明は、入力された音声を分析処理して求められる
入力音声パターンと、あらかじめ収集された学習音声パ
ターンに基づいて作成されている標準パターンとの間で
類似度または差異を計算することにより、前記入力され
た音声を認識する音声認識方式において、前記学習音声
パターンに対して少なくとも平滑処理と微分処理とを施
す複数のフィルタを用いて前記標準パターンを作成し、
微分処理によって得た前記標準パターンの軸と、入力音
声パターンを微分処理して得た微分パターンとの間で類
似度または差異を計算し、認識対象音声であるか否かを
判定することを特徴とする。[Configuration of the Invention] (Means for Solving the Problems) The present invention provides an input voice pattern obtained by analyzing input voice and a standard pattern created based on a learning voice pattern collected in advance. By calculating the degree of similarity or difference between and, in a voice recognition system that recognizes the input voice, using a plurality of filters that perform at least smoothing processing and differentiation processing on the learning voice pattern, Create a standard pattern,
Calculating a similarity or a difference between the axis of the standard pattern obtained by the differential processing and the differential pattern obtained by differentiating the input voice pattern, and determining whether or not the voice is a recognition target voice. And

（作用）学習音声パターンと入力音声パターンの両方を微分処
理し、音声パターンの特徴をより強調することにより、
類似度の大きさが音声パターンの相違に敏感になるの
で、認識対象外音声が入力された場合には、これを誤認
識せずに高い精度で拒否することが可能となる。(Action) By differentiating both the learning voice pattern and the input voice pattern to further emphasize the features of the voice pattern,
Since the magnitude of the similarity is sensitive to the difference in the voice pattern, when a voice not to be recognized is input, it can be rejected with high accuracy without erroneous recognition.

（実施例）以下、本発明の一実施例について図面を参照して説明
する。Hereinafter, an embodiment of the present invention will be described with reference to the drawings.

第１図は、本発明に係る音声認識方式を適用して構成
される音声認識装置の概略構成図である。すなわち、図
示しない音声入力部から入力された音声を電気信号に変
換して取込み、バンド・パス・フィルタなどからなる音
響分析部１にて音響分析し、音声区間検出部２にてその
単語音声区間を検出する。音声区間検出された入力音声
パターンから、標本点抽出部３にて上記音声区間を時間
方向に等分分割した所定点数の標本点を抽出し、（特徴
ベクトルの数×標本点数）で示される標本パターンを求
める。このようにして求めた標本パターンは、認識対象
とするカテゴリごとに所定数ずつ収集して音声パターン
蓄積部４に格納される。FIG. 1 is a schematic configuration diagram of a speech recognition device configured by applying a speech recognition system according to the present invention. That is, a voice input from a voice input unit (not shown) is converted into an electric signal and fetched, an acoustic analysis is performed by an acoustic analysis unit 1 including a band-pass filter, and a word voice section is detected by a voice section detection unit 2. Is detected. From the input voice pattern detected in the voice section, the sample point extraction unit 3 extracts a predetermined number of sample points obtained by equally dividing the voice section in the time direction, and obtains a sample represented by (the number of feature vectors × the number of sample points). Find a pattern. The sample patterns obtained in this manner are collected by a predetermined number for each category to be recognized and stored in the voice pattern storage unit 4.

そして、標準パターン６の作成は、音声パターン蓄積
部４に蓄積された標本パターンに対して、少なくとも平
滑処理および微分処理を実行する複数のフィルタ、たと
えば複数の直交化時間フィルタからなる直交化時間フィ
ルタ部５によって行なう。The standard pattern 6 is created by a plurality of filters for executing at least a smoothing process and a differentiation process on the sample pattern stored in the voice pattern storage unit 4, for example, an orthogonalization time filter including a plurality of orthogonalization time filters. Performed by part 5.

なお、ここでは、音声パターン蓄積部４に収集される
学習音声パターンとしては、たとえばｊ（＝1,2,〜16）
で示される16点の音響分析された特徴ベクトルからな
り、その音声区間をｋ（＝0,1,2,〜17）として17等分す
る18個の標本点に亙って採取したデータ系列として与え
られるものとして説明する。Here, the learning voice patterns collected in the voice pattern storage unit 4 include, for example, j (= 1, 2, to 16)
Is a data sequence collected over 18 sample points, which is composed of 16 acoustically analyzed feature vectors represented by the following, and divides the voice section into 17 equal parts as k (= 0, 1, 2, ~ 17). It will be described as given.

さて、直交化時間フィルタ部５は、カテゴリｉについ
て３個ずつ収集されたｍ番目の学習音声パターンをa
_m（j,k）としたとき、次のようにして標準パターン６を
作成している。Now, the orthogonalization time filter unit 5 calculates the m-th learning voice pattern collected for each category i by three a
_{When m} (j, k) is set, the standard pattern 6 is created as follows.

（１）まず、カテゴリｉの学習音声パターンa_m（j,
k）から、その平均パターンＡ（j,k）を［ｉ＝1,2,〜16、ｋ＝0,1,2,〜17］として求める。(1) First, a learning voice pattern a _m (j,
k), the average pattern A (j, k) [I = 1,2, 〜16, k = 0,1,2, 〜17].

（２）しかる後、上述した如くして求めた平均パター
ンＡ（j,k）を用いて、 b1(j,k)=A(j,k-1)+2*A(j,k)+A(j,k+1) …（２）［ｊ＝1,2,〜16,k＝1,2,〜16］なる演算にて標準パターンの第１軸b1（j,k）を求め、
これを標準パターン６に登録する。この標準パターンb1
（j,k）は、平均パターンＡ（j,k）を時間軸方向に平滑
化したものとして求められ、標準パターン６の基準とな
る第１軸のデータとして登録される。(2) Thereafter, using the average pattern A (j, k) obtained as described above, b1 (j, k) = A (j, k-1) + 2 * A (j, k) + A (j, k + 1) ... (2) [1 = 1,2, ~ 16, k = 1,2, ~ 16] The first axis b1 (j, k) of the standard pattern is obtained by the following calculation.
This is registered in the standard pattern 6. This standard pattern b1
(J, k) is obtained as a result of smoothing the average pattern A (j, k) in the time axis direction, and is registered as data of the first axis which is a reference of the standard pattern 6.

（３）しかる後、上記平均パターンＡ（j,k）を用
い、 b2（j,k）＝−Ａ（j,k−１）hA（j,k＋１） …（３）［ｊ＝1,2,〜16、ｋ＝1,2,〜16］なる演算にて標準パターンの第２軸b2（j,k）を求め、
これを正規化した後、標準パターン６に登録する。この
標準パターンb2（j,k）は、平均パターンＡ（j,k）を時
間軸方向に微分したものとして求められる。(3) Thereafter, using the average pattern A (j, k), b2 (j, k) = − A (j, k−1) hA (j, k + 1) (3) [j = 1,2 , ~ 16, k = 1,2, ~ 16] to obtain the second axis b2 (j, k) of the standard pattern,
After normalizing this, it is registered in the standard pattern 6. The standard pattern b2 (j, k) is obtained by differentiating the average pattern A (j, k) in the time axis direction.

以上の（１）〜（３）の処理を各カテゴリごとに繰返
し実行することによって、標準パターン６が作成され
る。The standard pattern 6 is created by repeatedly executing the above processes (1) to (3) for each category.

なお、この直交化時間フィルタ部５による処理手順を
次のように代えても、ほぼ同等な標準パターン６を作成
することができる。すなわち、（１）収集された学習音声パターンa_m（j,k）から［ｊ＝1,2,〜16,k＝1,2,〜16］として標準パターンの第１軸b1（j,k）を求め、これを
標準パターン６に登録する。Even if the processing procedure by the orthogonalization time filter unit 5 is changed as follows, a substantially equivalent standard pattern 6 can be created. That is, (1) From the collected learning voice patterns a _m (j, k) The first axis b1 (j, k) of the standard pattern is obtained as [j = 1,2, 〜16, k = 1,2, 〜16] and registered in the standard pattern 6.

（２）続いて学習音声パターンa_m（j,k）から［ｊ＝1,2,〜16,k＝1,2,〜16］として標準パターンの第２軸b2（j,k）を求め、これを
標準パターン６に登録する。(2) Then, from the learning voice pattern a _m (j, k) The second axis b2 (j, k) of the standard pattern is determined as [j = 1,2, 〜16, k = 1,2, 〜16] and registered in the standard pattern 6.

このような処理（１），（２）をカテゴリの数だけ繰
返し実行する。すなわち、前述したように一旦平均パタ
ーンＡ（j,k）を計算することなしに、収集された所定
の学習音声パターンa_m（j,k）から時間軸方向に平滑化
した標準パターンの第１軸b1（j,k）と、時間軸方向に
微分した標準パターンの第２軸b2（j,k）をそれぞれ直
接的に計算するようにしてもよい。Such processes (1) and (2) are repeatedly executed by the number of categories. That is, as described above, without first calculating the average pattern A (j, k), the first standard pattern of the standard pattern smoothed in the time axis direction from the collected predetermined learning voice pattern a _m (j, k). The axis b1 (j, k) and the second axis b2 (j, k) of the standard pattern differentiated in the time axis direction may be directly calculated.

ところで、上述した説明では、標準パターン６として
２軸までを求める例について示したが、更に２次微分を
行なうなどして標準パターンの３軸以降を作成するよう
にしてもよい。この場合には、学習音声パターンとして
前述した18点ではなく、たとえば20点以上の標本点を抽
出したものを用いるようにすればよい。この場合には、
たとえば b1（j,k）＝Ａ（j,k−２）＋４＊Ａ（j,k−１）＋６＊Ａ（j,k）＋４＊Ａ（j,k＋１）＋Ａ（j,k＋２） …（６）［ｊ＝1,2,〜16,k＝1,2,〜16］として標準パターンの第１軸b1（j,k）を求め、また b2（j,k）＝−Ａ（j,k−２）−２＊Ａ（j,k−１）＋２＊Ａ（j,k＋１）＋Ａ（j,k＋２） …（７）［ｊ＝1,2,〜16,k＝1,2,〜16］として標準パターンの第２軸b2（j,k）を求めるように
すればよい。そして、２次微分した標準パターンの第３
軸b3（j,k）については b3（j,k）＝−Ａ（j,k−２）−２＊Ａ（j,k−１）＋３＊Ａ（j,k）−２＊Ａ（j,k＋１）−Ａ（j,k＋２） …（８）［ｊ＝1,2,〜16,k＝1,2,〜16］として求めるようにすればよい。By the way, in the above description, an example in which up to two axes are obtained as the standard pattern 6 has been described. However, three or more axes of the standard pattern may be created by performing secondary differentiation. In this case, it is sufficient to use, for example, 20 or more sample points extracted as the learning voice patterns instead of the 18 points described above. In this case,
For example, b1 (j, k) = A (j, k-2) + 4 * A (j, k-1) + 6 * A (j, k) + 4 * A (j, k + 1) + A (j, k + 2) ... ( 6) [j = 1,2, j16, k = 1,2, 〜16] to obtain the first axis b1 (j, k) of the standard pattern, and b2 (j, k) = − A (j, k−2) −2 * A (j, k−1) + 2 * A (j, k + 1) + A (j, k + 2) (7) [j = 1,2, 〜16, k = 1,2, 〜 16], the second axis b2 (j, k) of the standard pattern may be obtained. And the third of the second derivative standard pattern
For the axis b3 (j, k), b3 (j, k) =-A (j, k-2) -2 * A (j, k-1) + 3 * A (j, k) -2 * A (j , k + 1) -A (j, k + 2) (8) [j = 1,2, 〜16, k = 1,2, 〜16].

次に、音声認識時の類似度演算について説明する。認
識用類似度演算部７は、上述した如く作成された標準パ
ターン６の全ての軸と、入力音声Ｖの標本パターンＷを
平滑パターン作成部８において X(j,k)=W(j,k-1)+2*W(j,k)+W(j,k+1) …（９）［ｊ＝1,2,〜16,k＝1,2,〜16］として平滑処理した平滑パターンＸとの間でとして、カテゴリｉの標準パターンｂ_i,rとの間の類似
度を計算するもので、この類似度にしたがって判定部９
で入力音声Ｖを認識する。Next, similarity calculation at the time of speech recognition will be described. The similarity calculating unit for recognition 7 calculates all the axes of the standard pattern 6 created as described above and the sample pattern W of the input voice V in the smoothing pattern creating unit 8 as X (j, k) = W (j, k). -1) + 2 * W (j, k) + W (j, k + 1) (9) Smoothed pattern smoothed as [j = 1,2, -16, k = 1,2, -16] Between X Calculates the similarity between the standard pattern b _{i, r} of the category i _, and determines the judgment unit 9 according to the similarity.
To recognize the input voice V.

なお、カテゴリｉの標準パターンｂ_i,rは、あらかじ
め正規化されたものであり、K_iはカテゴリｉの標準パタ
ーンの個数（軸数）を示している。また、（・）は内
積、‖ ‖はノルムを示す。The standard pattern b _{i, r} category i has been pre-normalized, K _i denotes the number of standard patterns of category i (number of axes). Also, (•) indicates the inner product and {} indicates the norm.

次に、入力音声が認識対象単語であるか否かを判定す
る方法について説明する。たとえば、前記認識処理にお
いて、入力音声Ｖに対する認識結果がカテゴリＩであっ
た場合について説明する。リジェクト用類似度演算部10
は、前述した如く作成されたカテゴリＩの標準パターン
６の第２軸と、入力音声Ｖの標本パターンＷを微分パタ
ーン作成部11においてＹ（j,k）＝−Ｗ（j,k−１）＋Ｗ（j,k＋１） …（11）［ｊ＝1,2,〜16,k＝1,2,〜16］として微分処理した微分パターンＹとの間でとして、カテゴリＩの標準パターン６の第２軸ｂ_I,2と
の間の類似度を計算するもので、この類似度値にしたが
って判定部９で入力音声Ｖを認識対象単語であるか否か
を判定する。この類似度値による判定で、入力音声Ｖが
認識対象外単語であると判定された場合には、前記認識
処理で得た認識結果のカテゴリＩは拒否され、必要に応
じて再発声の要求などが行なわれる。Next, a method for determining whether or not an input voice is a recognition target word will be described. For example, a case will be described in which the recognition result for the input voice V is category I in the recognition processing. Similarity calculation unit for rejection 10
Is obtained by using the second axis of the standard pattern 6 of category I created as described above and the sample pattern W of the input voice V in the differential pattern creating unit 11 as Y (j, k) =-W (j, k-1). + W (j, k + 1) (11) between the differential pattern Y subjected to the differential processing as [j = 1,2, -16, k = 1,2, -16] Calculates the similarity between the standard pattern 6 of the category I and the second axis b _{I, 2,} and determines whether or not the input voice V is a recognition target word in the determination unit 9 according to the similarity value. Is determined. When it is determined that the input voice V is a word not to be recognized in the determination based on the similarity value, the category I of the recognition result obtained in the above-described recognition processing is rejected, and a request for re-speaking is performed as necessary. Is performed.

このようにして、入力音声の標本パターンの微分パタ
ーンと、認識処理によって得た認識結果のカテゴリの標
準パターン６の第２軸との間で求めた類似度によってリ
ジェクト処理を行なう本方式によれば、入力音声を微分
処理することによって、上記リジェクト用類似度値が入
力音声の差異に敏感に応答するようになり、入力音声を
そのまま類似度演算に用いたり、平滑処理してから類似
度演算に用いて得たリジェクト用類似度演算値によるリ
ジェクト処理よりも高い精度で、認識対象外単語を拒否
することが可能となり、実用的効果が多大である。In this way, according to the present method, the rejection process is performed based on the similarity obtained between the differential pattern of the sample pattern of the input voice and the second axis of the standard pattern 6 of the category of the recognition result obtained by the recognition process. By differentiating the input voice, the similarity value for rejecting responds sensitively to the difference between the input voices, and the input voice can be used as it is for the similarity calculation, or can be smoothed before the similarity calculation. Words not to be recognized can be rejected with higher accuracy than rejection processing based on the rejection similarity calculation value obtained using the rejection similarity calculation value, and the practical effect is great.

ここで、そのことについて第３図を用いて詳細に説明
する。前述した例と同様に、たとえば学習音声パターン
が第３図（ａ）に示すようなパターンで、認識対象外の
入力音声パターンの平滑パターンが第３図（ｂ）に示す
ようなパターンであったとすると、これら両パターンの
微分パターンはそれぞれ第３図（ｄ）（ｅ）となり、こ
れら両微分パターン間で計算される類似度は第３図
（ｆ）に示すようになり、ほぼ零の類似度となる。した
がって、従来のような誤認識は生じない。Here, this will be described in detail with reference to FIG. As in the above-described example, for example, it is assumed that the learning voice pattern is a pattern as shown in FIG. 3 (a) and the smooth pattern of the input voice pattern not to be recognized is a pattern as shown in FIG. 3 (b). Then, the differential patterns of these two patterns are respectively shown in FIGS. 3 (d) and (e), and the similarity calculated between these two differential patterns is as shown in FIG. 3 (f), and the similarity of almost zero is obtained. Becomes Therefore, there is no erroneous recognition as in the related art.

このように、学習音声パターンと入力音声パターンの
両方を微分処理し、音声パターンの特徴をより強調する
ことにより、類似度の大きさが音声パターンの相違に敏
感に応答するようになる。したがって、認識対象外音声
が入力されても、これを誤認識せずに高い精度で拒否す
ることができるものである。In this way, by differentiating both the learning voice pattern and the input voice pattern and further emphasizing the features of the voice pattern, the magnitude of the similarity responds sensitively to the difference between the voice patterns. Therefore, even if a voice not to be recognized is input, it can be rejected with high accuracy without erroneous recognition.

第２図は、本発明の性能を調べるために行なった実験
の結果をグラフに示したものである。認識対象単語は人
名20単語とし、各単語をそれぞれ３回発声して標準パタ
ーンを作成し、認識時には、認識対象単語20単語と認識
対象外単語20単語の合せて40単語をそれぞれ２回発声し
て認識実験を行なった。話者は、男性７名と女性１名の
合せて８名である。第２図のグラフは、リジェクト判定
に用いる類似度の閾値を変動させたときの認識率を横軸
にとり、拒否率を縦軸にとったものである。ここで、認
識率、拒否率は以下のように定義する。FIG. 2 is a graph showing the results of an experiment conducted to examine the performance of the present invention. The recognition target word is 20 personal names, and each word is uttered three times to create a standard pattern. At the time of recognition, 40 words are uttered twice each including the 20 words to be recognized and the 20 words not to be recognized. A recognition experiment was performed. There are eight speakers, seven men and one woman. In the graph of FIG. 2, the horizontal axis represents the recognition rate and the vertical axis represents the rejection rate when the similarity threshold used for reject determination is changed. Here, the recognition rate and the rejection rate are defined as follows.

第２図のグラフにおいて、実線は本発明の実験結果を
示し、破線は前記（10）式で示した類似度Siをそのまま
リジェクト処理に使用した場合の実験結果を示してい
る。 In the graph of FIG. 2, the solid line indicates the experimental result of the present invention, and the broken line indicates the experimental result when the similarity Si expressed by the above equation (10) is used as it is in the rejection processing.

第２図のグラフに示されるように、たとえば94.0％の
認識率を実現した場合に、従来の方式では約５％の拒否
率しか得られないのに対して、本発明によれば約35％の
拒否率が得られ、大幅にリジェクト性能が向上すること
が明らかとなった。As shown in the graph of FIG. 2, for example, when a recognition rate of 94.0% is realized, the rejection rate of only about 5% is obtained by the conventional method, whereas the rejection rate of about 35% is obtained according to the present invention. Rejection rate was obtained, and it became clear that rejection performance was greatly improved.

以上の実験データから、入力音声の微分パターンをリ
ジェクト処理の類似度演算に用いることによって高いリ
ジェクト性能が得られることがわかる。故に、本方式は
音声認識性能の向上を図る上で多大な効果を奏すると言
える。From the above experimental data, it can be seen that high reject performance can be obtained by using the differential pattern of the input voice for the similarity calculation of the reject processing. Therefore, it can be said that this method has a great effect in improving speech recognition performance.

なお、本発明は前述した実施例に限定されるものでは
ない。たとえば、認識処理とリジェクト処理の両方に直
交化時間フィルタにより作成した標準パターンを用いた
が、認識処理にはいわゆるDPマッチング法などの他の方
式を用いて、リジェクト処理のみに学習音声パターンを
微分処理した標準パターンを用いてもよい。The present invention is not limited to the embodiments described above. For example, although a standard pattern created by an orthogonalized time filter was used for both recognition processing and reject processing, other methods such as the so-called DP matching method were used for recognition processing, and the learned speech pattern was differentiated only for reject processing. A processed standard pattern may be used.

また、微分処理フィルタの係数としては幾つかのバリ
エーションが考えられるが、要は学習音声パターンを微
分処理した標準パターンと入力音声パターンを微分処理
した微分パターンとの間で類似度または差異を求めてリ
ジェクト処理を行なうものであり、種々変形して実施す
ることができる。There are several variations of the coefficient of the differential processing filter. The point is that the similarity or difference between the standard pattern obtained by differentiating the learning voice pattern and the differential pattern obtained by differentiating the input voice pattern is obtained. A reject process is performed, and various modifications can be made.

さらに、学習音声パターンの次元数なども特に限定さ
れるものでもなく、本発明はその要旨を逸脱しない範囲
で種々変形して実施可能である。Furthermore, the number of dimensions of the learning voice pattern is not particularly limited, and the present invention can be implemented with various modifications without departing from the gist thereof.

［発明の効果］以上説明したように本発明によれば、学習音声パター
ンと入力音声パターンの両方を微分処理し、音声パター
ンの特徴をより強調することにより、類似度の大きさが
音声パターンの相違に敏感になるので、認識対象外音声
が入力された場合には、これを誤認識せずに高い精度で
拒否することが可能となる音声認識方式を提供できる。[Effects of the Invention] As described above, according to the present invention, both the learning voice pattern and the input voice pattern are differentiated, and the features of the voice pattern are further emphasized, so that the magnitude of the similarity is reduced. Since the difference becomes sensitive, it is possible to provide a speech recognition method that enables rejection with high accuracy without erroneously recognizing speech that is not recognized.

[Brief description of the drawings]

第１図は本発明に係る音声認識方式を適用して構成され
る音声認識装置の概略構成図、第２図は本発明の性能を
調べるために行なった実験の結果を示すグラフ、第３図
は認識対象外音声が入力されたときのリジェクト処理を
説明するための図である。１……音響分析部、２……音声区間検出部、３……標本
点抽出部、４……音声パターン蓄積部、５……直交化時
間フィルタ部、６……標準パターン、７……認識用類似
度演算部、８……平滑パターン作成部、９……判定部、
10……リジェクト用類似度演算部、11……微分パターン
作成部。FIG. 1 is a schematic configuration diagram of a speech recognition apparatus configured by applying the speech recognition method according to the present invention, FIG. 2 is a graph showing the results of an experiment conducted to examine the performance of the present invention, and FIG. FIG. 7 is a diagram for explaining rejection processing when a voice not to be recognized is input. 1 ... Acoustic analysis unit, 2 ... Sound section detection unit, 3 ... Sample point extraction unit, 4 ... Sound pattern storage unit, 5 ... Orthogonalization time filter unit, 6 ... Standard pattern, 7 ... Recognition Similarity calculation unit, 8 ... smoothing pattern creation unit, 9 ... determination unit,
10 ... Rejection similarity calculation unit, 11 ... Differential pattern creation unit.

Claims

(57) [Claims]

A similarity or a difference is calculated between an input voice pattern obtained by analyzing an input voice and a standard pattern created based on a previously collected learning voice pattern. In the speech recognition method for recognizing the input speech, the standard pattern is created by using a plurality of filters that perform at least a smoothing process and a differentiation process on the learning speech pattern, and the standard obtained by the differentiation process. A speech recognition method comprising calculating a similarity or a difference between a pattern axis and a differential pattern obtained by differentiating an input speech pattern, and determining whether the speech is a recognition target speech.

And calculating a similarity or a difference between an input voice pattern obtained by analyzing the input voice and a first standard pattern created based on a learning voice pattern collected in advance. Thus, in the voice recognition method for recognizing the input voice, a differential process is performed on the learning voice pattern to create a second standard pattern, and the generated second standard pattern and the input voice pattern are differentiated. A speech recognition method characterized by calculating a similarity or a difference between the differential pattern and the differential pattern obtained as a result, and determining whether or not the speech is a recognition target speech.