JPH0962292A

JPH0962292A - Reject method in time series pattern recognition processing and time series pattern recognition device mounting it

Info

Publication number: JPH0962292A
Application number: JP7215674A
Authority: JP
Inventors: Toshiyuki Odaka; 俊之小高; Akio Amano; 明雄天野
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1995-08-24
Filing date: 1995-08-24
Publication date: 1997-03-07
Anticipated expiration: 2015-08-24
Also published as: JP3533773B2

Abstract

PROBLEM TO BE SOLVED: To reduce the processing amount by finding the optimum path obtained from a state having the maximum probability among states terminated with an HMM network and a state having the maximum probability among all states, and judging as rejection when the maximum probability state is not on the optimum path. SOLUTION: An optimum path calculating means 103 finds the probability which is obtained from a state becoming a termination among probabilities obtained from a reference means 101 and is maximum among the probabilities, and finds the optimum path having the maximum probability. A maximum likelihood state calculating means 104 finds the probability which is maximum among all probabilities obtained from a reference means and finds a state having the maximum probability. A judging means 105 judges whether the state obtained from the maximum likelihood state calculating means 104 exists on the optimum path obtaining from the optimum path calculating means 103 or not, and if it exists, outputs a pattern represented by the optimum path obtaining from the calculating means 103 as the recognition result, and if not, outputs rejection as the rejection result.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、時系列パターン認識装
置に係り、特に、認識対象外のパターンが入力された場
合にそれを検出することができるリジェクト方法および
それを実装した時系列パターン認識装置に関するもので
ある。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a time-series pattern recognition device, and more particularly to a reject method capable of detecting a pattern which is not a recognition target and a time-series pattern recognition having the pattern rejected. It relates to the device.

【０００２】なお、リジェクト方法あるいはリジェクシ
ョンとは、認識処理が「認識対象外データの入力」ある
いは「（確度の高い）認識結果の該当なし」等を認識結
果として出力する機能である。The reject method or rejection is a function in which the recognition processing outputs "input of non-recognition target data" or "(no high accuracy) recognition result" as a recognition result.

【０００３】[0003]

【従来の技術】一般に認識処理は、認識対象として予め
与えられている基準データのうち、入力データに一番類
似している基準データを認識結果として出力する。2. Description of the Related Art Generally, a recognition process outputs, as a recognition result, reference data that is most similar to input data among reference data given in advance as recognition targets.

【０００４】音声、音楽、筆記文字、手話、ジェスチ
ャ、動画像、等は時系列パターンに変換し、これを例え
ば統計的な手法であるＨＭＭ（Hidden Markov Model）
を用いて、認識できる。これらの認識技術を様々な応用
システムに組み込み実用化するにあたっては、リジェク
ションが必須の機能である。なぜならば、例えば音声認
識応用システムの場合、利用者の発声内容を誤認識した
結果でシステムが誤動作を起こすよりも、リジェクショ
ンを組み込み、利用者に再発声を促すような仕組みを持
たせた方が使い勝手が良くなる。Voice, music, written characters, sign language, gestures, moving images, etc. are converted into a time series pattern, which is, for example, a statistical method HMM (Hidden Markov Model).
Can be recognized by using. Rejection is an essential function when incorporating these recognition technologies into various application systems and putting them into practical use. This is because, for example, in the case of a voice recognition application system, a system that incorporates rejection and has a mechanism that encourages the user to re-voice rather than cause the system to malfunction as a result of erroneously recognizing the user's utterance content. Will be easier to use.

【０００５】リジェクションの実現方法として、確率値
に対して予め絶対的なしきい値を設けてリジェクション
を行うことが考えられる。しかし、認識結果が同じで
も、入力データの生成環境が異なればＨＭＭで得られる
確率値自身も変動するので、絶対的なしきい値の設定は
困難である。As a method of realizing rejection, it is conceivable to set an absolute threshold value in advance for the probability value and perform rejection. However, even if the recognition result is the same, if the input data generation environment is different, the probability value itself obtained by the HMM also changes, so it is difficult to set an absolute threshold value.

【０００６】これに対して、確率値に相対的なしきい値
を設定するリジェクションがある。音声認識における例
としては、”渡辺他：音節認識を用いたゆう度補正によ
る未知語発話のリジェクション、電子情報通信学会論文
誌、Ｖｏｌ．Ｊ７５−ＤーＩＩ、Ｎｏ．１２、ｐｐ．２
００２−２００９”に示される方法がある。この方法
は、アプリケーションに関する認識対象の単語あるいは
文を表す第１のＨＭＭネットワーク（標準パターンに相
当する）の照合を行い、さらにこの第１のネットワーク
とは別に、音節の並びに制約のない音節列を表す第２の
ＨＭＭネットワークに対する照合（音節タイプライタと
も言う）も行う。ここで、第２のＨＭＭネットワーク照
合結果の確率値を相対的に参照することでリジェクショ
ンを実現する。On the other hand, there is rejection that sets a relative threshold value for the probability value. Examples of speech recognition include "Watanabe et al .: Rejection of unknown word utterance by likelihood correction using syllable recognition, IEICE Transactions, Vol. J75-D-II, No. 12, pp. 2.
002-2009 ". This method collates a first HMM network (corresponding to a standard pattern) that represents a word or a sentence to be recognized regarding an application. Separately, matching (also called syllable typewriter) with a second HMM network representing a syllable sequence and an unconstrained syllable string is also performed, where the probability value of the second HMM network matching result is relatively referred to. Achieve rejection.

【０００７】[0007]

【発明が解決しようとする課題】上記のような従来の時
系列パターン認識処理におけるリジェクト方法では、第
２のＨＭＭネットワークの照合処理に伴う処理量および
処理に必要な記憶容量が増加してしまうという課題があ
った。計算量が増えれば、応答速度が遅くなる問題が生
じる。また、計算機資源が限られた環境では使用する記
憶容量をできる限り節約する必要がある。例えば携帯端
末の入力手段の１つとして音声認識等の応用を考えた場
合には、同レベルの機能が消費電力の観点からも計算量
や記憶容量をできる限り少なくすることが要求される。In the conventional reject method in the time-series pattern recognition processing as described above, the processing amount and the storage capacity necessary for the processing in the second HMM network matching processing increase. There were challenges. If the amount of calculation increases, there is a problem that the response speed becomes slow. Also, in an environment where computer resources are limited, it is necessary to save the storage capacity used as much as possible. For example, when an application such as voice recognition is considered as one of the input means of the mobile terminal, it is required that the functions of the same level reduce the calculation amount and the storage capacity as much as possible from the viewpoint of power consumption.

【０００８】本発明の目的は、オートマトン等を含む確
率モデルに基づく時系列パターンの認識処理において、
処理量及び記憶容量を増加させないリジェクト方法を提
供することにある。特に、オートマトンの一種であるＨ
ＭＭを用いたリジェクト方法を提供する。An object of the present invention is to recognize a time series pattern based on a probabilistic model including an automaton,
An object of the present invention is to provide a reject method that does not increase the processing amount and the storage capacity. Especially, H which is a kind of automaton
A reject method using MM is provided.

【０００９】本発明の他の目的は、認識対象を表すＨＭ
Ｍネットワークの照合処理以外に、処理量および記憶容
量を増加させないリジェクト方法およびそのリジェクト
方法を実装した時系列パターン認識装置を提供すること
にある。特に、認識対象外の音や音声入力、あるいは一
部が欠落した音声（無声化音声）の誤入力による誤動作
を防ぐリジェクト方法およびそのリジェクト方法を実装
した音声認識装置を提供する。Another object of the present invention is an HM representing a recognition target.
Another object of the present invention is to provide a reject method that does not increase the processing amount and the storage capacity other than the matching process of the M network, and a time-series pattern recognition device that implements the reject method. In particular, the present invention provides a reject method for preventing a malfunction due to an erroneous input of a sound or voice input that is not a recognition target, or a voice in which a part of the voice is missing (unvoiced voice), and a voice recognition device equipped with the reject method.

【００１０】[0010]

【課題を解決するための手段】上記の目的を達成するた
めに本発明では、時系列入力パターンに対するＨＭＭネ
ットワーク上の各状態の確率計算を終了した時点で、認
識対象を表すＨＭＭネットワークで終端となりうる状態
のうち最大確率を持つ状態から得られる最適パスと、認
識対象を表すＨＭＭネットワークで全状態のうち最大確
率を持つ状態を求め、該ＨＭＭネットワークの全状態の
うち最大確率を持つ状態が前記最適パス上の状態でない
場合にリジェクションと判定することで、認識対象外の
パターン入力を検出し、誤動作を防ぐリジェクト方法、
およびそのリジェクト方法を実装した時系列パターン認
識装置が提供される。In order to achieve the above object, according to the present invention, when the probability calculation of each state on the HMM network with respect to a time series input pattern is completed, the HMM network that represents the recognition target terminates. The optimum path obtained from the state having the maximum probability among the possible states and the state having the maximum probability among all the states in the HMM network representing the recognition target are obtained, and the state having the maximum probability among all the states of the HMM network is Rejection method that detects pattern input that is not a recognition target by judging rejection when it is not on the optimal path and prevents malfunction,
And a time series pattern recognition device implementing the reject method.

【００１１】具体的な本発明の構成では、オートマトン
等を含む確率モデルに基づく時系列パターン認識処理に
おいて、時系列入力パターンに対する確率モデル上の各
状態の確率計算を終了した時点で、認識対象を表す確率
モデルで終端となりうる状態のうち最大確率を持つ状態
から得られる最適パスと、認識対象を表す確率モデルで
全状態のうち最大確率を持つ状態を求め、前記全状態の
うち最大確率を持つ状態が前記最適パス上の状態でない
場合にリジェクションと判定する。In the concrete configuration of the present invention, in the time series pattern recognition processing based on the probabilistic model including the automaton and the like, when the probability calculation of each state on the probabilistic model for the time series input pattern is completed, the recognition target is recognized. The optimal path obtained from the state that has the maximum probability among the states that can be the terminal in the probabilistic model that is represented, and the state that has the maximum probability of all the states in the probabilistic model that represents the recognition target are obtained, and the maximum probability of all states is obtained. If the state is not on the optimum path, it is determined as rejection.

【００１２】また、本発明の他の構成ではオートマトン
等を含む確率モデルに基づく時系列パターン認識処理に
おいて、時系列入力パターンに対する確率モデル上の各
状態の確率計算を終了した時点で、認識対象を表す確率
モデルで終端となりうる状態のうち最大確率を持つ状態
と、認識対象を表す確率モデルで全状態のうち最大確率
を持つ状態を求め、前記２つの最大確率を持つ状態の確
率値を比較し、確率値の差が予め定めたしきい値より大
きい場合にリジェクションと判定する。In another configuration of the present invention, in a time series pattern recognition process based on a probabilistic model including an automaton or the like, the recognition target is determined at the time when the probability calculation of each state on the probabilistic model for the time series input pattern is completed. The state having the maximum probability among the states that can be the terminal in the probabilistic model that is represented and the state that has the maximum probability of all the states in the probabilistic model that represents the recognition target are obtained, and the probability values of the states having the two maximum probabilities are compared. , If the difference between the probability values is larger than a predetermined threshold value, it is determined as rejection.

【００１３】ＨＭＭを利用する際の具体的な構成では、
ＨＭＭに基づく時系列パターン認識処理において、時系
列入力パターンに対するＨＭＭネットワーク上の各状態
の確率計算を終了した時点で、認識対象を表すＨＭＭネ
ットワークで終端となりうる状態のうち最大確率を持つ
状態から得られる最適パスと、認識対象を表すＨＭＭネ
ットワークで全状態のうち最大確率を持つ状態を求め、
前記全状態のうち最大確率を持つ状態が前記最適パス上
の状態でない場合にリジェクションと判定する。[0013] In the concrete configuration when using the HMM,
In the time-series pattern recognition process based on HMM, when the probability calculation of each state on the HMM network with respect to the time-series input pattern is completed, it is obtained from the state having the maximum probability among the states that can be terminated in the HMM network that represents the recognition target. The optimal path that is obtained and the state that has the maximum probability of all states in the HMM network that represents the recognition target,
Rejection is determined when the state having the maximum probability among all the states is not on the optimum path.

【００１４】ＨＭＭを利用する際の具体的な他の構成で
は、ＨＭＭに基づく時系列パターン認識処理において、
時系列入力パターンに対するＨＭＭネットワーク上の各
状態の確率計算を終了した時点で、認識対象を表すＨＭ
Ｍネットワークで終端となりうる状態のうち最大確率を
持つ状態と、認識対象を表すＨＭＭネットワークで全状
態のうち最大確率を持つ状態を求め、前記２つの最大確
率を持つ状態の確率値を比較し、確率値の差が予め定め
たしきい値より大きい場合にリジェクションと判定す
る。In another specific configuration when using the HMM, in the time series pattern recognition processing based on the HMM,
The HM representing the recognition target at the time when the probability calculation of each state on the HMM network for the time series input pattern is completed.
The state having the maximum probability among the states that can be the termination in the M network and the state having the maximum probability among all the states in the HMM network representing the recognition target are obtained, and the probability values of the states having the two maximum probabilities are compared. If the difference between the probability values is larger than a predetermined threshold value, it is determined as rejection.

【００１５】本発明の音声認識装置は、音声を電気信号
として取り込む音声入力手段と、該入力音声の特徴を時
系列パターンに変換して出力する音声分析手段と、認識
対象となる音声パターンを表すＨＭＭネットワークを保
持する記憶手段と、該記憶手段に記憶されているＨＭＭ
ネットワークと前記時系列パターンとを照合してＨＭＭ
ネットワーク上の各状態の確率を計算する照合手段と、
前記ＨＭＭネットワークで終端となりうる状態のうち最
大確率を持つ状態から得られる最適パスを求める最適パ
ス算出手段と、前記ＨＭＭネットワークで全状態のうち
最大確率を持つ状態を求める最大尤度算出手段と、上記
最大確率を持つ状態が上記最適パス上の状態であるか否
かを判定し、上記最適パス上にない場合には「該当無
し」との認識結果を出力する判定手段を有する。The voice recognition device of the present invention represents a voice input means for capturing a voice as an electric signal, a voice analysis means for converting the features of the input voice into a time series pattern and outputting the time series pattern, and a voice pattern to be recognized. Storage means for holding the HMM network, and HMM stored in the storage means
HMM by matching the network and the time series pattern
Collation means for calculating the probability of each state on the network,
An optimal path calculating means for obtaining an optimal path obtained from a state having a maximum probability among states that can be terminated in the HMM network; a maximum likelihood calculating means for obtaining a state having a maximum probability of all states in the HMM network; It has a determining unit that determines whether or not the state having the maximum probability is on the optimal path, and outputs a recognition result of "not applicable" when the state is not on the optimal path.

【００１６】この判定手段は、上記最適パス確率を求め
て上記最大確率と比較し、該比較結果が所定の閾値より
大きい場合には「該当無し」との認識結果を出力する判
定手段としてもよい。The determining means may be a determining means for obtaining the optimum path probability, comparing it with the maximum probability, and outputting a recognition result "not applicable" when the comparison result is larger than a predetermined threshold value. .

【００１７】[0017]

【作用】本発明によれば、認識対象を表すＨＭＭネット
ワークの照合処理のみで、認識対象外パターンの検出が
可能となるので、処理量および記憶容量をほとんど増加
させることなくリジェクションを実現できる。According to the present invention, the pattern not to be recognized can be detected only by the matching process of the HMM network representing the object to be recognized, so that the rejection can be realized with almost no increase in processing amount and storage capacity.

【００１８】[0018]

【実施例】以下、図を用いて本発明の実施例を説明す
る。なお、ＨＭＭによるパターン認識の詳細な説明は、
例えば”中川聖一：確率モデルによる音声認識、電子情
報通信学会、１９８８”等にあり、ここでは詳細は述べ
ない。また、本発明ではＨＭＭ等の認識単位モデルの種
類を限定しない。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of the present invention will be described below with reference to the drawings. A detailed description of pattern recognition by HMM is given below.
For example, "Seiichi Nakagawa: Speech recognition by probabilistic model, Institute of Electronics, Information and Communication Engineers, 1988" and the like are not described here in detail. The present invention does not limit the type of recognition unit model such as HMM.

【００１９】図１は、本発明の処理手順の一実施例を示
すフローチャートである。図２は、ＨＭＭネットワーク
の一例を示す図である。FIG. 1 is a flow chart showing an embodiment of the processing procedure of the present invention. FIG. 2 is a diagram showing an example of the HMM network.

【００２０】図１のフローチャートに基づいて本発明の
処理手順を説明する。まず、入力されるパターンに対し
て、ＨＭＭネットワーク上の確率を計算する（１００
１）。入力パターンは時系列パターンで、例えばアルフ
ァベット等のシンボル列で表すと、「ａａａａｂｃｃｃ
ｃｄｄ」のような時系列パターンになる。初期設定で
は、始端の状態（図２でＳs）に確率１を与え、その他
の状態は確率０とする。その後は単位時間毎に、確率を
持つ状態（Ｓ1）が、アークで繋がれた状態（Ｓ2）に遷
移し、かつその時刻の入力シンボルが出現する確率を求
め、後の状態Ｓ2の新たな確率とする。最終時点では、
ＨＭＭネットワーク上で遷移の伝わった状態がそれぞれ
確率値を持つことになる。通常、この最終時点でＨＭＭ
ネットワーク上の終端となり得る状態（図２でＳf1、Ｓ
f2、Ｓf3、…）のみの間でその確率の大小を比較し、そ
の中で一番確率の高い状態が求められ、そこに至ったＨ
ＭＭネットワーク上のパスが認識結果の候補として選択
される（１００２）。図２の例では、”ＡＢＣ”であ
る。ここで求めた状態をＳbestとし、選択されたパスを
Ｒbestとする。本発明では、さらに、ＨＭＭネットワー
ク上の全状態の中で一番確率の高い状態を求める（１０
０３）。ここで得られる状態をＳmaxとする。次に状態
ＳmaxがパスＲbest上の状態の１つであるか検査する
（１００４）。ここで、もしＳmaxがパスＲbest上の状
態の１つであれば、Ｒbestに対応するパターン（”ＡＢ
Ｃ”）を認識結果として出力し（１００５）、Ｓmaxが
パスＲbest上の状態でなければ、”該当なし”を認識結
果として出力する（１００６）。”該当なし”を出力す
る機能がリジェクションである。The processing procedure of the present invention will be described with reference to the flowchart of FIG. First, the probability on the HMM network is calculated for the input pattern (100
1). The input pattern is a time-series pattern. For example, when represented by a symbol string such as an alphabet, "aaaabcccc"
It becomes a time series pattern like "cdd". In the initial setting, a probability of 1 is given to the starting state (Ss in FIG. 2), and a probability of 0 is given to the other states. After that, for each unit time, the state (S1) having a probability transits to the state (S2) connected by an arc, and the probability that the input symbol at that time appears is calculated, and the new probability of the subsequent state S2 is calculated. And At the end,
Each state of the transition on the HMM network has a probability value. Usually at this final point HMM
A state that can be a terminal on the network (Sf1, S in FIG. 2)
f2, Sf3, ...) are compared only for the magnitude of the probability, and the state with the highest probability among them is sought.
A path on the MM network is selected as a recognition result candidate (1002). In the example of FIG. 2, it is “ABC”. The state obtained here is Sbest, and the selected path is Rbest. In the present invention, the state with the highest probability among all states on the HMM network is obtained (10
03). The state obtained here is Smax. Next, it is checked whether the state Smax is one of the states on the path Rbest (1004). Here, if Smax is one of the states on the path Rbest, the pattern ("AB
C ”) is output as the recognition result (1005), and if Smax is not in the state on the path Rbest,“ Not applicable ”is output as the recognition result (1006). The function of outputting“ Not applicable ”is rejection. is there.

【００２１】ここで、ＲbestとＳmaxはそれぞれ同じ確
率を持ったパスあるいは状態が複数個ずつ求められるケ
ースもあり得るが、同様な考え方でＳmaxがＲbest上の
状態となる組み合わせがある場合その結果を出力し、な
い場合はリジェクションと判定すれば良い。Here, there may be a case where a plurality of paths or states having the same probability are obtained for Rbest and Smax, respectively. However, if there is a combination in which Smax is in a state on Rbest, the result is obtained. If it is not output, it may be determined as rejection if it is not output.

【００２２】入力パターンが認識対象内のデータとして
非常に尤もらしい場合には、照合がうまく行われ、Ｓbe
stとＳmaxは一致するはずである。ただし、認識対象外
の入力の場合は、必ずしもこれらが一致するとは限らな
い。これは、与えられたＨＭＭネットワークの一部分
で、最大確率を持つ状態Ｓmaxに至ったパスから入力さ
れた認識対象外の時系列パターンの認識結果が無理矢理
に近似解として求められることを意味している。ＨＭＭ
ネットワークが大きければ大きいほど、そのネットワー
クの表現の広さから真の解に近い近似解が得られ易くな
り、認識対象外の入力の検出がしやすくなる。音声認識
装置の場合を例にとると、ネットワークが十分大きけれ
ば、１音節ずつ認識する音節認識した場合と十分近似し
た結果が得られることになる。また、認識対象外入力の
検出の場合と同様の原理で、息、咳、咳払い等の短い雑
音をリジェクトでき、さらに未知語もリジェクトでき
る。また、無声化音声などに対する音声区間検出処理の
ミスを検出できる。If the input pattern is very likely as the data to be recognized, the collation is successful and Sbe
st and Smax should match. However, in the case of an input that is not a recognition target, these do not always match. This means that the recognition result of a time-series pattern that is not a recognition target, which is input from a path that reaches the state Smax having the maximum probability, is forcibly obtained as an approximate solution in a given HMM network. . HMM
The larger the network, the easier it is to obtain an approximate solution that is close to the true solution due to the breadth of the representation of the network, and the easier it is to detect inputs that are not recognition targets. Taking the case of a voice recognition device as an example, if the network is sufficiently large, a result sufficiently similar to that in the case of recognizing syllables one by one can be obtained. Further, it is possible to reject short noises such as breath, cough, and throat clearing as well as unknown words based on the same principle as in the case of detection of an unrecognized input. Further, it is possible to detect an error in the voice section detection process for unvoiced voice and the like.

【００２３】なお、図１における１００３と１００４の
処理量は、入力パターンに対するＨＭＭネットワークの
確率計算（１００１）に比較すれば無視できるほど小さ
い。また、本方法では面倒なしきい値の設定が不要であ
るという特徴がある。The processing amount of 1003 and 1004 in FIG. 1 is so small that it can be ignored in comparison with the probability calculation (1001) of the HMM network for the input pattern. Further, this method has a feature that it is not necessary to set a complicated threshold value.

【００２４】図３に、本発明による処理手順の他の実施
例を示す。本実施例は、先の実施例に対して、確率値に
しきい値を設定し、リジェクト判定する点が異なる。FIG. 3 shows another embodiment of the processing procedure according to the present invention. The present embodiment is different from the previous embodiments in that a threshold value is set for the probability value and reject determination is performed.

【００２５】図３のフローチャートに基づいて本発明の
処理手順を説明する。まず、入力されるパターンに対し
て、ＨＭＭネットワーク上の確率を計算する（２００
１）。通常、この最終時点でＨＭＭネットワーク上の終
端となり得る状態のみの間でその確率の大小を比較し、
その中で一番確率の高い状態が求められ、そこに至った
ＨＭＭネットワーク上のパスが認識結果の候補として選
択される（２００２）。ここで求めた状態をＳbestと
し、選択されたパスをＲbestとする。ここまでは、図１
を使って説明した１つ目の実施例と同じである。さらに
Ｓbestでの確率をＰbestとして保持する（２００２）。
次に、ＨＭＭネットワーク上の全状態の中で一番高い確
率Ｐmaxを求める（２００３）。次に、ＰmaxとＰbestの
差を入力パターンの長さ（フレーム数Ｎとする）で割っ
て正規化した値を参照確率値（ΔＰ）として求める（２
００４）。次にΔＰと予め設定したしきい値（Ｐth）を
比較検査する（２００５）。ここで、もしΔＰがＰthよ
りも小さければ、Ｐbestに対応するパターンを認識結果
として出力し（２００６）、ΔＰがＰthよりも大きけれ
ば、”該当なし”を認識結果として出力する（２００
６）。The processing procedure of the present invention will be described with reference to the flowchart of FIG. First, the probability on the HMM network is calculated for the input pattern (200
1). Usually, the probability is compared only between states that can be terminations on the HMM network at this final point,
The state with the highest probability among them is sought, and the path on the HMM network that has reached that state is selected as a recognition result candidate (2002). The state obtained here is Sbest, and the selected path is Rbest. Up to this point,
This is the same as the first embodiment described using. Further, the probability at Sbest is held as Pbest (2002).
Next, the highest probability Pmax among all the states on the HMM network is calculated (2003). Next, the difference between Pmax and Pbest is divided by the length of the input pattern (the number of frames is N) to obtain a normalized value as a reference probability value (ΔP) (2
004). Next, ΔP and a preset threshold value (Pth) are compared and inspected (2005). Here, if ΔP is smaller than Pth, the pattern corresponding to Pbest is output as the recognition result (2006), and if ΔP is larger than Pth, “not applicable” is output as the recognition result (200
6).

【００２６】図４は、本発明によるパターン認識装置の
構成を示すブロック図である。FIG. 4 is a block diagram showing the structure of the pattern recognition apparatus according to the present invention.

【００２７】図４において、ＨＭＭネットワーク格納手
段１０２は、認識対象を表現するＨＭＭネットワークを
格納する。本発明では、ＨＭＭネットワーク格納手段１
０２におけるＨＭＭネットワークの格納方法は限定され
ない。照合手段１０１は、入力パターンが、前記ＨＭＭ
のネットワークに表現されている個々のパターンとなる
確率を算出する。最適パス算出手段１０３は、照合手段
１０１から得られる確率のうち終端となり得る状態から
得られて、かつその中で最大の確率を求め、その確率を
持つ最適パスを求める。最大尤度状態算出手段１０４
は、照合手段から得られる全確率の中で最大の確率を求
め、その確率を持つ状態を求める。判定手段１０５は、
最適パス算出手段１０３から求められる最適パス上に最
大尤度状態算出手段１０４から求められる状態が存在し
ているかどうかを判定し、存在する場合は、最適パス算
出手段１０３から求められる最適パスで表現されるパタ
ーンを認識結果として出力し、存在しない場合は、リジ
ェクションを認識結果として出力する。In FIG. 4, the HMM network storage means 102 stores the HMM network expressing the recognition target. In the present invention, the HMM network storage means 1
The storage method of the HMM network in 02 is not limited. The collating means 101 uses the HMM as the input pattern.
The probability of each pattern being expressed in the network is calculated. The optimum path calculation unit 103 obtains the maximum probability obtained from the probable states of the probabilities obtained from the matching unit 101, and obtains the optimum path having the probability. Maximum likelihood state calculation means 104
Calculates the maximum probability among all the probabilities obtained from the matching means, and calculates the state having that probability. The determination means 105
It is determined whether or not the state calculated by the maximum likelihood state calculation unit 104 exists on the optimum path obtained by the optimum path calculation unit 103, and if it exists, the state is expressed by the optimum path obtained by the optimum path calculation unit 103. The pattern to be output is output as the recognition result, and if it does not exist, the rejection is output as the recognition result.

【００２８】図５は、本発明に係る音声認識装置の一実
施例を示すブロック図である。FIG. 5 is a block diagram showing an embodiment of the voice recognition apparatus according to the present invention.

【００２９】図５において、音声入力手段２０６は、マ
イクとアナログ−デジタル変換器からなり、空気振動で
ある音声波形を入力とし、一定のサンプリング周期でサ
ンプリングされた振幅値のデータ列を出力する。音声分
析手段２０７は、音声入力手段２０６の出力であるデー
タ列を一定時間間隔あるいは一定個数分ずつ分析処理
し、入力されたデータ列の特徴をいくつかのパラメータ
を含む特徴ベクトル列に変換して出力する。ここでの特
徴ベクトル列は、図１に関する説明の中でのシンボル列
に相当する。本発明は、この分析処理のパラメータの種
類を限定しない。例えば、ＬＰＣケプストラム係数や短
区間パワー値等を用いることができるが、その他のパラ
メータも含め詳細は”古井：音響・音声工学、近代科学
社、１９９２”等にある。続いて、ＨＭＭネットワーク
格納手段２０２は、認識対象（文あるいは単語）を表現
するＨＭＭネットワークを格納する。本発明では、ＨＭ
Ｍネットワーク格納手段２０２におけるＨＭＭネットワ
ークの格納方法は限定されない。照合手段２０１は、入
力パターンである前記特徴ベクトル列が、前記ＨＭＭの
ネットワークに表現されている個々のパターンとなる確
率を算出する。最適パス算出手段２０３は、照合手段２
０１から得られる確率のうち終端となり得る状態から得
られて、かつその中で最大の確率を求め、その確率を持
つ最適パスを求める。最大尤度状態算出手段２０４は、
照合手段から得られる全確率の中で最大の確率を求め、
その確率を持つ状態を求める。判定手段２０５は、最適
パス算出手段２０３から求められる最適パス上に最大尤
度状態算出手段２０４から求められる状態が存在してい
るかどうかを判定し、存在する場合は、最適パス算出手
段２０３から求められる最適パスで表現されるパターン
を認識結果として出力し、存在しない場合は、リジェク
ションを認識結果として出力する。判定手段２０５にお
いては、図１とその説明に示すリジェクト方法を用いれ
ばよい。In FIG. 5, the voice input means 206 comprises a microphone and an analog-digital converter, receives a voice waveform of air vibration as an input, and outputs a data string of amplitude values sampled at a constant sampling period. The voice analysis unit 207 analyzes the data sequence output from the voice input unit 206 by a constant time interval or a fixed number of units, and converts the features of the input data sequence into a feature vector sequence containing some parameters. Output. The feature vector sequence here corresponds to the symbol sequence in the description relating to FIG. The present invention does not limit the types of parameters of this analysis process. For example, the LPC cepstrum coefficient, short-term power value, etc. can be used, and details including other parameters can be found in "Furui: Acoustic and Speech Engineering, Modern Science Co., Ltd., 1992". Subsequently, the HMM network storage unit 202 stores the HMM network expressing the recognition target (sentence or word). In the present invention, HM
The method of storing the HMM network in the M network storage unit 202 is not limited. The matching means 201 calculates the probability that the feature vector sequence, which is an input pattern, will be an individual pattern expressed in the HMM network. The optimum path calculation means 203 is the matching means 2
Among the probabilities obtained from 01, the maximum probability obtained from the state that can be the terminal end and among them is obtained, and the optimal path having that probability is obtained. The maximum likelihood state calculation means 204
Find the maximum probability among all the probabilities obtained from the matching means,
Find the state with that probability. The determining unit 205 determines whether or not the state obtained by the maximum likelihood state calculating unit 204 exists on the optimum path obtained by the optimum path calculating unit 203, and if there is, obtains it by the optimum path calculating unit 203. The pattern expressed by the optimum path is output as the recognition result, and if it does not exist, the rejection is output as the recognition result. The judging means 205 may use the reject method shown in FIG. 1 and its description.

【００３０】また、判定手段２０５においては、第３図
に示したように、ＨＭＭネットワーク上の終端となりえ
る状態の確率の中で一番確率の高い状態の確率（Ｐbes
t)とＨＭＭネットワーク上の全状態の中で一番高い確率
とを比較検査し、その差が所定の閾値よりも大きい場合
には「該当無し」を認識結果として出力してもよい。Further, in the judging means 205, as shown in FIG. 3, the probability of the state having the highest probability (Pbes) among the probabilities of the states which can be the terminal on the HMM network.
t) and the highest probability among all states on the HMM network may be compared and inspected, and if the difference is larger than a predetermined threshold value, “not applicable” may be output as the recognition result.

【００３１】[0031]

【発明の効果】本発明によれば、処理量および記憶容量
をほとんど増加させることなくリジェクションを実現で
きる効果が得られる。例えば、音声認識に応用した場合
は、息、咳、咳払い等の短い雑音をリジェクトでき、さ
らに未知語もリジェクトできる。また、無声化音声に対
する音声区間検出処理のミスを検出できる効果が得られ
る。したがって、本発明のリジェクション方法を実装し
た応用システムでは、利用者の誤入力や認識処理の誤認
識に起因するシステムの誤動作を防げる。According to the present invention, the effect that the rejection can be realized with almost no increase in the processing amount and the storage capacity can be obtained. For example, when applied to voice recognition, short noises such as breath, cough, and throat clearing can be rejected, and unknown words can also be rejected. Further, it is possible to obtain the effect of being able to detect a mistake in the voice section detection process for unvoiced speech. Therefore, in the application system in which the rejection method of the present invention is implemented, it is possible to prevent malfunction of the system due to erroneous input by the user or erroneous recognition of recognition processing.

【００３２】なお、本発明によるリジェクションは、Ｈ
ＭＭネットワークが大規模になればなるほど任意の入力
パターンを表現でき、効果がでやすくなる。The rejection according to the present invention is H
The larger the MM network is, the more arbitrary input patterns can be expressed, and the more effective the effect becomes.

【００３３】[0033]

[Brief description of drawings]

【図１】本発明の処理手順の一例を示すフローチャート
である。FIG. 1 is a flowchart showing an example of a processing procedure of the present invention.

【図２】本発明におけるＨＭＭネットワークの一例を示
すブロック図である。FIG. 2 is a block diagram showing an example of an HMM network according to the present invention.

【図３】本発明の処理手順の他の例を示すフローチャー
トである。FIG. 3 is a flowchart showing another example of the processing procedure of the present invention.

【図４】本発明による時系列パターン認識装置の構成の
一実施例を示すブロック図である。FIG. 4 is a block diagram showing an embodiment of a configuration of a time series pattern recognition device according to the present invention.

【図５】本発明による音声認識装置の構成の一実施例を
示すブロック図である。FIG. 5 is a block diagram showing an embodiment of the configuration of a voice recognition device according to the present invention.

[Explanation of symbols]

１０１…照合手段、１０２…ＨＭＭネットワーク格納手
段、１０３…最適パス算出手段、１０４…最大尤度状態
算出手段、１０５…判定手段。101 ... Collation means, 102 ... HMM network storage means, 103 ... Optimal path calculation means, 104 ... Maximum likelihood state calculation means, 105 ... Judgment means.

Claims

[Claims]

1. A time series pattern recognition process based on a probabilistic model including an automaton, etc., when the probability calculation of each state on the probabilistic model for a time series input pattern is completed, the probabilistic model representing the recognition target can be terminated. The optimal path obtained from the state having the maximum probability among the states, and the state having the maximum probability among all the states is obtained by the stochastic model representing the recognition target, and the state having the maximum probability among all the states is on the optimal path. Rejection method characterized by determining rejection if not in a state.

2. A time series pattern recognition process based on a probabilistic model including an automaton and the like, at the time when the probability calculation of each state on the probabilistic model for a time series input pattern is completed, the probabilistic model representing the recognition target can be terminated. The state having the maximum probability among the states and the state having the maximum probability among all states are obtained by the probability model representing the recognition target, and the probability values of the states having the two maximum probabilities are compared. A reject method, characterized in that it is judged to be rejected when it is larger than a predetermined threshold value.

3. In the time-series pattern recognition processing based on HMM, at the time when the probability calculation of each state on the HMM network for the time-series input pattern is completed, the maximum probability among the states that can be terminated in the HMM network representing the recognition target. The optimal path obtained from the states having the maximum probability and the state having the maximum probability of all states in the HMM network that represents the recognition target, and rejected if the state having the maximum probability of all states is not on the optimal path. Rejection method, which is characterized by determining that it is an action.

4. In a time series pattern recognition process based on HMM, at the time when the probability calculation of each state on the HMM network with respect to the time series input pattern is completed, the maximum probability among states that can be terminated in the HMM network representing the recognition target. And a state having the maximum probability among all states in the HMM network representing the recognition target, the probability values of the states having the two maximum probabilities are compared, and the difference between the probability values is a predetermined threshold value. Rejection method characterized by determining rejection if larger.

5. A voice input means for taking a voice as an electric signal, a voice analysis means for converting the features of the input voice into a time series pattern and outputting the same, and a memory for holding an HMM network representing a voice pattern to be recognized. Means, collating means for collating the HMM network stored in the storage means with the time series pattern to calculate the probability of each state on the HMM network, and the maximum probability of states that can be terminated in the HMM network. Optimal path calculating means for obtaining an optimal path obtained from a state having
Maximum likelihood calculating means for obtaining a state having the maximum probability among all states in the network, and determining whether or not the state having the maximum probability is a state on the optimal path, and when it is not on the optimal path, Is a voice recognition device having a determination means for outputting a recognition result of "not applicable".

6. A voice input means for taking a voice as an electric signal, a voice analysis means for converting a feature of the input voice into a time series pattern and outputting the same, and a memory for holding an HMM network representing a voice pattern to be recognized. Means, collating means for collating the HMM network stored in the storage means with the time series pattern to calculate the probability of each state on the HMM network, and the maximum probability of states that can be terminated in the HMM network. Optimal path calculating means for obtaining an optimal path obtained from a state having
Maximum likelihood calculating means for obtaining a state having the maximum probability among all states in the network, and obtaining the optimum path probability and comparing it with the maximum probability. If the comparison result is larger than a predetermined threshold value, "not applicable". A voice recognition device having a determination means for outputting the recognition result of