JPH0962292A - Reject method in time series pattern recognition processing and time series pattern recognition device mounting it - Google Patents

Reject method in time series pattern recognition processing and time series pattern recognition device mounting it

Info

Publication number
JPH0962292A
JPH0962292A JP7215674A JP21567495A JPH0962292A JP H0962292 A JPH0962292 A JP H0962292A JP 7215674 A JP7215674 A JP 7215674A JP 21567495 A JP21567495 A JP 21567495A JP H0962292 A JPH0962292 A JP H0962292A
Authority
JP
Japan
Prior art keywords
state
probability
states
maximum
hmm
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP7215674A
Other languages
Japanese (ja)
Other versions
JP3533773B2 (en
Inventor
Toshiyuki Odaka
俊之 小高
Akio Amano
明雄 天野
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hitachi Ltd
Original Assignee
Hitachi Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hitachi Ltd filed Critical Hitachi Ltd
Priority to JP21567495A priority Critical patent/JP3533773B2/en
Publication of JPH0962292A publication Critical patent/JPH0962292A/en
Application granted granted Critical
Publication of JP3533773B2 publication Critical patent/JP3533773B2/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Abstract

PROBLEM TO BE SOLVED: To reduce the processing amount by finding the optimum path obtained from a state having the maximum probability among states terminated with an HMM network and a state having the maximum probability among all states, and judging as rejection when the maximum probability state is not on the optimum path. SOLUTION: An optimum path calculating means 103 finds the probability which is obtained from a state becoming a termination among probabilities obtained from a reference means 101 and is maximum among the probabilities, and finds the optimum path having the maximum probability. A maximum likelihood state calculating means 104 finds the probability which is maximum among all probabilities obtained from a reference means and finds a state having the maximum probability. A judging means 105 judges whether the state obtained from the maximum likelihood state calculating means 104 exists on the optimum path obtaining from the optimum path calculating means 103 or not, and if it exists, outputs a pattern represented by the optimum path obtaining from the calculating means 103 as the recognition result, and if not, outputs rejection as the rejection result.

Description

【発明の詳細な説明】Detailed Description of the Invention

【0001】[0001]

【産業上の利用分野】本発明は、時系列パターン認識装
置に係り、特に、認識対象外のパターンが入力された場
合にそれを検出することができるリジェクト方法および
それを実装した時系列パターン認識装置に関するもので
ある。
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a time-series pattern recognition device, and more particularly to a reject method capable of detecting a pattern which is not a recognition target and a time-series pattern recognition having the pattern rejected. It relates to the device.

【0002】なお、リジェクト方法あるいはリジェクシ
ョンとは、認識処理が「認識対象外データの入力」ある
いは「(確度の高い)認識結果の該当なし」等を認識結
果として出力する機能である。
The reject method or rejection is a function in which the recognition processing outputs "input of non-recognition target data" or "(no high accuracy) recognition result" as a recognition result.

【0003】[0003]

【従来の技術】一般に認識処理は、認識対象として予め
与えられている基準データのうち、入力データに一番類
似している基準データを認識結果として出力する。
2. Description of the Related Art Generally, a recognition process outputs, as a recognition result, reference data that is most similar to input data among reference data given in advance as recognition targets.

【0004】音声、音楽、筆記文字、手話、ジェスチ
ャ、動画像、等は時系列パターンに変換し、これを例え
ば統計的な手法であるHMM(Hidden Markov Model)
を用いて、認識できる。これらの認識技術を様々な応用
システムに組み込み実用化するにあたっては、リジェク
ションが必須の機能である。なぜならば、例えば音声認
識応用システムの場合、利用者の発声内容を誤認識した
結果でシステムが誤動作を起こすよりも、リジェクショ
ンを組み込み、利用者に再発声を促すような仕組みを持
たせた方が使い勝手が良くなる。
Voice, music, written characters, sign language, gestures, moving images, etc. are converted into a time series pattern, which is, for example, a statistical method HMM (Hidden Markov Model).
Can be recognized by using. Rejection is an essential function when incorporating these recognition technologies into various application systems and putting them into practical use. This is because, for example, in the case of a voice recognition application system, a system that incorporates rejection and has a mechanism that encourages the user to re-voice rather than cause the system to malfunction as a result of erroneously recognizing the user's utterance content. Will be easier to use.

【0005】リジェクションの実現方法として、確率値
に対して予め絶対的なしきい値を設けてリジェクション
を行うことが考えられる。しかし、認識結果が同じで
も、入力データの生成環境が異なればHMMで得られる
確率値自身も変動するので、絶対的なしきい値の設定は
困難である。
As a method of realizing rejection, it is conceivable to set an absolute threshold value in advance for the probability value and perform rejection. However, even if the recognition result is the same, if the input data generation environment is different, the probability value itself obtained by the HMM also changes, so it is difficult to set an absolute threshold value.

【0006】これに対して、確率値に相対的なしきい値
を設定するリジェクションがある。音声認識における例
としては、”渡辺他:音節認識を用いたゆう度補正によ
る未知語発話のリジェクション、電子情報通信学会論文
誌、Vol.J75−DーII、No.12、pp.2
002−2009”に示される方法がある。この方法
は、アプリケーションに関する認識対象の単語あるいは
文を表す第1のHMMネットワーク(標準パターンに相
当する)の照合を行い、さらにこの第1のネットワーク
とは別に、音節の並びに制約のない音節列を表す第2の
HMMネットワークに対する照合(音節タイプライタと
も言う)も行う。ここで、第2のHMMネットワーク照
合結果の確率値を相対的に参照することでリジェクショ
ンを実現する。
On the other hand, there is rejection that sets a relative threshold value for the probability value. Examples of speech recognition include "Watanabe et al .: Rejection of unknown word utterance by likelihood correction using syllable recognition, IEICE Transactions, Vol. J75-D-II, No. 12, pp. 2.
002-2009 ". This method collates a first HMM network (corresponding to a standard pattern) that represents a word or a sentence to be recognized regarding an application. Separately, matching (also called syllable typewriter) with a second HMM network representing a syllable sequence and an unconstrained syllable string is also performed, where the probability value of the second HMM network matching result is relatively referred to. Achieve rejection.

【0007】[0007]

【発明が解決しようとする課題】上記のような従来の時
系列パターン認識処理におけるリジェクト方法では、第
2のHMMネットワークの照合処理に伴う処理量および
処理に必要な記憶容量が増加してしまうという課題があ
った。計算量が増えれば、応答速度が遅くなる問題が生
じる。また、計算機資源が限られた環境では使用する記
憶容量をできる限り節約する必要がある。例えば携帯端
末の入力手段の1つとして音声認識等の応用を考えた場
合には、同レベルの機能が消費電力の観点からも計算量
や記憶容量をできる限り少なくすることが要求される。
In the conventional reject method in the time-series pattern recognition processing as described above, the processing amount and the storage capacity necessary for the processing in the second HMM network matching processing increase. There were challenges. If the amount of calculation increases, there is a problem that the response speed becomes slow. Also, in an environment where computer resources are limited, it is necessary to save the storage capacity used as much as possible. For example, when an application such as voice recognition is considered as one of the input means of the mobile terminal, it is required that the functions of the same level reduce the calculation amount and the storage capacity as much as possible from the viewpoint of power consumption.

【0008】本発明の目的は、オートマトン等を含む確
率モデルに基づく時系列パターンの認識処理において、
処理量及び記憶容量を増加させないリジェクト方法を提
供することにある。特に、オートマトンの一種であるH
MMを用いたリジェクト方法を提供する。
An object of the present invention is to recognize a time series pattern based on a probabilistic model including an automaton,
An object of the present invention is to provide a reject method that does not increase the processing amount and the storage capacity. Especially, H which is a kind of automaton
A reject method using MM is provided.

【0009】本発明の他の目的は、認識対象を表すHM
Mネットワークの照合処理以外に、処理量および記憶容
量を増加させないリジェクト方法およびそのリジェクト
方法を実装した時系列パターン認識装置を提供すること
にある。特に、認識対象外の音や音声入力、あるいは一
部が欠落した音声(無声化音声)の誤入力による誤動作
を防ぐリジェクト方法およびそのリジェクト方法を実装
した音声認識装置を提供する。
Another object of the present invention is an HM representing a recognition target.
Another object of the present invention is to provide a reject method that does not increase the processing amount and the storage capacity other than the matching process of the M network, and a time-series pattern recognition device that implements the reject method. In particular, the present invention provides a reject method for preventing a malfunction due to an erroneous input of a sound or voice input that is not a recognition target, or a voice in which a part of the voice is missing (unvoiced voice), and a voice recognition device equipped with the reject method.

【0010】[0010]

【課題を解決するための手段】上記の目的を達成するた
めに本発明では、時系列入力パターンに対するHMMネ
ットワーク上の各状態の確率計算を終了した時点で、認
識対象を表すHMMネットワークで終端となりうる状態
のうち最大確率を持つ状態から得られる最適パスと、認
識対象を表すHMMネットワークで全状態のうち最大確
率を持つ状態を求め、該HMMネットワークの全状態の
うち最大確率を持つ状態が前記最適パス上の状態でない
場合にリジェクションと判定することで、認識対象外の
パターン入力を検出し、誤動作を防ぐリジェクト方法、
およびそのリジェクト方法を実装した時系列パターン認
識装置が提供される。
In order to achieve the above object, according to the present invention, when the probability calculation of each state on the HMM network with respect to a time series input pattern is completed, the HMM network that represents the recognition target terminates. The optimum path obtained from the state having the maximum probability among the possible states and the state having the maximum probability among all the states in the HMM network representing the recognition target are obtained, and the state having the maximum probability among all the states of the HMM network is Rejection method that detects pattern input that is not a recognition target by judging rejection when it is not on the optimal path and prevents malfunction,
And a time series pattern recognition device implementing the reject method.

【0011】具体的な本発明の構成では、オートマトン
等を含む確率モデルに基づく時系列パターン認識処理に
おいて、時系列入力パターンに対する確率モデル上の各
状態の確率計算を終了した時点で、認識対象を表す確率
モデルで終端となりうる状態のうち最大確率を持つ状態
から得られる最適パスと、認識対象を表す確率モデルで
全状態のうち最大確率を持つ状態を求め、前記全状態の
うち最大確率を持つ状態が前記最適パス上の状態でない
場合にリジェクションと判定する。
In the concrete configuration of the present invention, in the time series pattern recognition processing based on the probabilistic model including the automaton and the like, when the probability calculation of each state on the probabilistic model for the time series input pattern is completed, the recognition target is recognized. The optimal path obtained from the state that has the maximum probability among the states that can be the terminal in the probabilistic model that is represented, and the state that has the maximum probability of all the states in the probabilistic model that represents the recognition target are obtained, and the maximum probability of all states is obtained. If the state is not on the optimum path, it is determined as rejection.

【0012】また、本発明の他の構成ではオートマトン
等を含む確率モデルに基づく時系列パターン認識処理に
おいて、時系列入力パターンに対する確率モデル上の各
状態の確率計算を終了した時点で、認識対象を表す確率
モデルで終端となりうる状態のうち最大確率を持つ状態
と、認識対象を表す確率モデルで全状態のうち最大確率
を持つ状態を求め、前記2つの最大確率を持つ状態の確
率値を比較し、確率値の差が予め定めたしきい値より大
きい場合にリジェクションと判定する。
In another configuration of the present invention, in a time series pattern recognition process based on a probabilistic model including an automaton or the like, the recognition target is determined at the time when the probability calculation of each state on the probabilistic model for the time series input pattern is completed. The state having the maximum probability among the states that can be the terminal in the probabilistic model that is represented and the state that has the maximum probability of all the states in the probabilistic model that represents the recognition target are obtained, and the probability values of the states having the two maximum probabilities are compared. , If the difference between the probability values is larger than a predetermined threshold value, it is determined as rejection.

【0013】HMMを利用する際の具体的な構成では、
HMMに基づく時系列パターン認識処理において、時系
列入力パターンに対するHMMネットワーク上の各状態
の確率計算を終了した時点で、認識対象を表すHMMネ
ットワークで終端となりうる状態のうち最大確率を持つ
状態から得られる最適パスと、認識対象を表すHMMネ
ットワークで全状態のうち最大確率を持つ状態を求め、
前記全状態のうち最大確率を持つ状態が前記最適パス上
の状態でない場合にリジェクションと判定する。
[0013] In the concrete configuration when using the HMM,
In the time-series pattern recognition process based on HMM, when the probability calculation of each state on the HMM network with respect to the time-series input pattern is completed, it is obtained from the state having the maximum probability among the states that can be terminated in the HMM network that represents the recognition target. The optimal path that is obtained and the state that has the maximum probability of all states in the HMM network that represents the recognition target,
Rejection is determined when the state having the maximum probability among all the states is not on the optimum path.

【0014】HMMを利用する際の具体的な他の構成で
は、HMMに基づく時系列パターン認識処理において、
時系列入力パターンに対するHMMネットワーク上の各
状態の確率計算を終了した時点で、認識対象を表すHM
Mネットワークで終端となりうる状態のうち最大確率を
持つ状態と、認識対象を表すHMMネットワークで全状
態のうち最大確率を持つ状態を求め、前記2つの最大確
率を持つ状態の確率値を比較し、確率値の差が予め定め
たしきい値より大きい場合にリジェクションと判定す
る。
In another specific configuration when using the HMM, in the time series pattern recognition processing based on the HMM,
The HM representing the recognition target at the time when the probability calculation of each state on the HMM network for the time series input pattern is completed.
The state having the maximum probability among the states that can be the termination in the M network and the state having the maximum probability among all the states in the HMM network representing the recognition target are obtained, and the probability values of the states having the two maximum probabilities are compared. If the difference between the probability values is larger than a predetermined threshold value, it is determined as rejection.

【0015】本発明の音声認識装置は、音声を電気信号
として取り込む音声入力手段と、該入力音声の特徴を時
系列パターンに変換して出力する音声分析手段と、認識
対象となる音声パターンを表すHMMネットワークを保
持する記憶手段と、該記憶手段に記憶されているHMM
ネットワークと前記時系列パターンとを照合してHMM
ネットワーク上の各状態の確率を計算する照合手段と、
前記HMMネットワークで終端となりうる状態のうち最
大確率を持つ状態から得られる最適パスを求める最適パ
ス算出手段と、前記HMMネットワークで全状態のうち
最大確率を持つ状態を求める最大尤度算出手段と、上記
最大確率を持つ状態が上記最適パス上の状態であるか否
かを判定し、上記最適パス上にない場合には「該当無
し」との認識結果を出力する判定手段を有する。
The voice recognition device of the present invention represents a voice input means for capturing a voice as an electric signal, a voice analysis means for converting the features of the input voice into a time series pattern and outputting the time series pattern, and a voice pattern to be recognized. Storage means for holding the HMM network, and HMM stored in the storage means
HMM by matching the network and the time series pattern
Collation means for calculating the probability of each state on the network,
An optimal path calculating means for obtaining an optimal path obtained from a state having a maximum probability among states that can be terminated in the HMM network; a maximum likelihood calculating means for obtaining a state having a maximum probability of all states in the HMM network; It has a determining unit that determines whether or not the state having the maximum probability is on the optimal path, and outputs a recognition result of "not applicable" when the state is not on the optimal path.

【0016】この判定手段は、上記最適パス確率を求め
て上記最大確率と比較し、該比較結果が所定の閾値より
大きい場合には「該当無し」との認識結果を出力する判
定手段としてもよい。
The determining means may be a determining means for obtaining the optimum path probability, comparing it with the maximum probability, and outputting a recognition result "not applicable" when the comparison result is larger than a predetermined threshold value. .

【0017】[0017]

【作用】本発明によれば、認識対象を表すHMMネット
ワークの照合処理のみで、認識対象外パターンの検出が
可能となるので、処理量および記憶容量をほとんど増加
させることなくリジェクションを実現できる。
According to the present invention, the pattern not to be recognized can be detected only by the matching process of the HMM network representing the object to be recognized, so that the rejection can be realized with almost no increase in processing amount and storage capacity.

【0018】[0018]

【実施例】以下、図を用いて本発明の実施例を説明す
る。なお、HMMによるパターン認識の詳細な説明は、
例えば”中川聖一:確率モデルによる音声認識、電子情
報通信学会、1988”等にあり、ここでは詳細は述べ
ない。また、本発明ではHMM等の認識単位モデルの種
類を限定しない。
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of the present invention will be described below with reference to the drawings. A detailed description of pattern recognition by HMM is given below.
For example, "Seiichi Nakagawa: Speech recognition by probabilistic model, Institute of Electronics, Information and Communication Engineers, 1988" and the like are not described here in detail. The present invention does not limit the type of recognition unit model such as HMM.

【0019】図1は、本発明の処理手順の一実施例を示
すフローチャートである。図2は、HMMネットワーク
の一例を示す図である。
FIG. 1 is a flow chart showing an embodiment of the processing procedure of the present invention. FIG. 2 is a diagram showing an example of the HMM network.

【0020】図1のフローチャートに基づいて本発明の
処理手順を説明する。まず、入力されるパターンに対し
て、HMMネットワーク上の確率を計算する(100
1)。入力パターンは時系列パターンで、例えばアルフ
ァベット等のシンボル列で表すと、「aaaabccc
cdd」のような時系列パターンになる。初期設定で
は、始端の状態(図2でSs)に確率1を与え、その他
の状態は確率0とする。その後は単位時間毎に、確率を
持つ状態(S1)が、アークで繋がれた状態(S2)に遷
移し、かつその時刻の入力シンボルが出現する確率を求
め、後の状態S2の新たな確率とする。最終時点では、
HMMネットワーク上で遷移の伝わった状態がそれぞれ
確率値を持つことになる。通常、この最終時点でHMM
ネットワーク上の終端となり得る状態(図2でSf1、S
f2、Sf3、…)のみの間でその確率の大小を比較し、そ
の中で一番確率の高い状態が求められ、そこに至ったH
MMネットワーク上のパスが認識結果の候補として選択
される(1002)。図2の例では、”ABC”であ
る。ここで求めた状態をSbestとし、選択されたパスを
Rbestとする。本発明では、さらに、HMMネットワー
ク上の全状態の中で一番確率の高い状態を求める(10
03)。ここで得られる状態をSmaxとする。次に状態
SmaxがパスRbest上の状態の1つであるか検査する
(1004)。ここで、もしSmaxがパスRbest上の状
態の1つであれば、Rbestに対応するパターン(”AB
C”)を認識結果として出力し(1005)、Smaxが
パスRbest上の状態でなければ、”該当なし”を認識結
果として出力する(1006)。”該当なし”を出力す
る機能がリジェクションである。
The processing procedure of the present invention will be described with reference to the flowchart of FIG. First, the probability on the HMM network is calculated for the input pattern (100
1). The input pattern is a time-series pattern. For example, when represented by a symbol string such as an alphabet, "aaaabcccc"
It becomes a time series pattern like "cdd". In the initial setting, a probability of 1 is given to the starting state (Ss in FIG. 2), and a probability of 0 is given to the other states. After that, for each unit time, the state (S1) having a probability transits to the state (S2) connected by an arc, and the probability that the input symbol at that time appears is calculated, and the new probability of the subsequent state S2 is calculated. And At the end,
Each state of the transition on the HMM network has a probability value. Usually at this final point HMM
A state that can be a terminal on the network (Sf1, S in FIG. 2)
f2, Sf3, ...) are compared only for the magnitude of the probability, and the state with the highest probability among them is sought.
A path on the MM network is selected as a recognition result candidate (1002). In the example of FIG. 2, it is “ABC”. The state obtained here is Sbest, and the selected path is Rbest. In the present invention, the state with the highest probability among all states on the HMM network is obtained (10
03). The state obtained here is Smax. Next, it is checked whether the state Smax is one of the states on the path Rbest (1004). Here, if Smax is one of the states on the path Rbest, the pattern ("AB
C ”) is output as the recognition result (1005), and if Smax is not in the state on the path Rbest,“ Not applicable ”is output as the recognition result (1006). The function of outputting“ Not applicable ”is rejection. is there.

【0021】ここで、RbestとSmaxはそれぞれ同じ確
率を持ったパスあるいは状態が複数個ずつ求められるケ
ースもあり得るが、同様な考え方でSmaxがRbest上の
状態となる組み合わせがある場合その結果を出力し、な
い場合はリジェクションと判定すれば良い。
Here, there may be a case where a plurality of paths or states having the same probability are obtained for Rbest and Smax, respectively. However, if there is a combination in which Smax is in a state on Rbest, the result is obtained. If it is not output, it may be determined as rejection if it is not output.

【0022】入力パターンが認識対象内のデータとして
非常に尤もらしい場合には、照合がうまく行われ、Sbe
stとSmaxは一致するはずである。ただし、認識対象外
の入力の場合は、必ずしもこれらが一致するとは限らな
い。これは、与えられたHMMネットワークの一部分
で、最大確率を持つ状態Smaxに至ったパスから入力さ
れた認識対象外の時系列パターンの認識結果が無理矢理
に近似解として求められることを意味している。HMM
ネットワークが大きければ大きいほど、そのネットワー
クの表現の広さから真の解に近い近似解が得られ易くな
り、認識対象外の入力の検出がしやすくなる。音声認識
装置の場合を例にとると、ネットワークが十分大きけれ
ば、1音節ずつ認識する音節認識した場合と十分近似し
た結果が得られることになる。また、認識対象外入力の
検出の場合と同様の原理で、息、咳、咳払い等の短い雑
音をリジェクトでき、さらに未知語もリジェクトでき
る。また、無声化音声などに対する音声区間検出処理の
ミスを検出できる。
If the input pattern is very likely as the data to be recognized, the collation is successful and Sbe
st and Smax should match. However, in the case of an input that is not a recognition target, these do not always match. This means that the recognition result of a time-series pattern that is not a recognition target, which is input from a path that reaches the state Smax having the maximum probability, is forcibly obtained as an approximate solution in a given HMM network. . HMM
The larger the network, the easier it is to obtain an approximate solution that is close to the true solution due to the breadth of the representation of the network, and the easier it is to detect inputs that are not recognition targets. Taking the case of a voice recognition device as an example, if the network is sufficiently large, a result sufficiently similar to that in the case of recognizing syllables one by one can be obtained. Further, it is possible to reject short noises such as breath, cough, and throat clearing as well as unknown words based on the same principle as in the case of detection of an unrecognized input. Further, it is possible to detect an error in the voice section detection process for unvoiced voice and the like.

【0023】なお、図1における1003と1004の
処理量は、入力パターンに対するHMMネットワークの
確率計算(1001)に比較すれば無視できるほど小さ
い。また、本方法では面倒なしきい値の設定が不要であ
るという特徴がある。
The processing amount of 1003 and 1004 in FIG. 1 is so small that it can be ignored in comparison with the probability calculation (1001) of the HMM network for the input pattern. Further, this method has a feature that it is not necessary to set a complicated threshold value.

【0024】図3に、本発明による処理手順の他の実施
例を示す。本実施例は、先の実施例に対して、確率値に
しきい値を設定し、リジェクト判定する点が異なる。
FIG. 3 shows another embodiment of the processing procedure according to the present invention. The present embodiment is different from the previous embodiments in that a threshold value is set for the probability value and reject determination is performed.

【0025】図3のフローチャートに基づいて本発明の
処理手順を説明する。まず、入力されるパターンに対し
て、HMMネットワーク上の確率を計算する(200
1)。通常、この最終時点でHMMネットワーク上の終
端となり得る状態のみの間でその確率の大小を比較し、
その中で一番確率の高い状態が求められ、そこに至った
HMMネットワーク上のパスが認識結果の候補として選
択される(2002)。ここで求めた状態をSbestと
し、選択されたパスをRbestとする。ここまでは、図1
を使って説明した1つ目の実施例と同じである。さらに
Sbestでの確率をPbestとして保持する(2002)。
次に、HMMネットワーク上の全状態の中で一番高い確
率Pmaxを求める(2003)。次に、PmaxとPbestの
差を入力パターンの長さ(フレーム数Nとする)で割っ
て正規化した値を参照確率値(ΔP)として求める(2
004)。次にΔPと予め設定したしきい値(Pth)を
比較検査する(2005)。ここで、もしΔPがPthよ
りも小さければ、Pbestに対応するパターンを認識結果
として出力し(2006)、ΔPがPthよりも大きけれ
ば、”該当なし”を認識結果として出力する(200
6)。
The processing procedure of the present invention will be described with reference to the flowchart of FIG. First, the probability on the HMM network is calculated for the input pattern (200
1). Usually, the probability is compared only between states that can be terminations on the HMM network at this final point,
The state with the highest probability among them is sought, and the path on the HMM network that has reached that state is selected as a recognition result candidate (2002). The state obtained here is Sbest, and the selected path is Rbest. Up to this point,
This is the same as the first embodiment described using. Further, the probability at Sbest is held as Pbest (2002).
Next, the highest probability Pmax among all the states on the HMM network is calculated (2003). Next, the difference between Pmax and Pbest is divided by the length of the input pattern (the number of frames is N) to obtain a normalized value as a reference probability value (ΔP) (2
004). Next, ΔP and a preset threshold value (Pth) are compared and inspected (2005). Here, if ΔP is smaller than Pth, the pattern corresponding to Pbest is output as the recognition result (2006), and if ΔP is larger than Pth, “not applicable” is output as the recognition result (200
6).

【0026】図4は、本発明によるパターン認識装置の
構成を示すブロック図である。
FIG. 4 is a block diagram showing the structure of the pattern recognition apparatus according to the present invention.

【0027】図4において、HMMネットワーク格納手
段102は、認識対象を表現するHMMネットワークを
格納する。本発明では、HMMネットワーク格納手段1
02におけるHMMネットワークの格納方法は限定され
ない。照合手段101は、入力パターンが、前記HMM
のネットワークに表現されている個々のパターンとなる
確率を算出する。最適パス算出手段103は、照合手段
101から得られる確率のうち終端となり得る状態から
得られて、かつその中で最大の確率を求め、その確率を
持つ最適パスを求める。最大尤度状態算出手段104
は、照合手段から得られる全確率の中で最大の確率を求
め、その確率を持つ状態を求める。判定手段105は、
最適パス算出手段103から求められる最適パス上に最
大尤度状態算出手段104から求められる状態が存在し
ているかどうかを判定し、存在する場合は、最適パス算
出手段103から求められる最適パスで表現されるパタ
ーンを認識結果として出力し、存在しない場合は、リジ
ェクションを認識結果として出力する。
In FIG. 4, the HMM network storage means 102 stores the HMM network expressing the recognition target. In the present invention, the HMM network storage means 1
The storage method of the HMM network in 02 is not limited. The collating means 101 uses the HMM as the input pattern.
The probability of each pattern being expressed in the network is calculated. The optimum path calculation unit 103 obtains the maximum probability obtained from the probable states of the probabilities obtained from the matching unit 101, and obtains the optimum path having the probability. Maximum likelihood state calculation means 104
Calculates the maximum probability among all the probabilities obtained from the matching means, and calculates the state having that probability. The determination means 105
It is determined whether or not the state calculated by the maximum likelihood state calculation unit 104 exists on the optimum path obtained by the optimum path calculation unit 103, and if it exists, the state is expressed by the optimum path obtained by the optimum path calculation unit 103. The pattern to be output is output as the recognition result, and if it does not exist, the rejection is output as the recognition result.

【0028】図5は、本発明に係る音声認識装置の一実
施例を示すブロック図である。
FIG. 5 is a block diagram showing an embodiment of the voice recognition apparatus according to the present invention.

【0029】図5において、音声入力手段206は、マ
イクとアナログ−デジタル変換器からなり、空気振動で
ある音声波形を入力とし、一定のサンプリング周期でサ
ンプリングされた振幅値のデータ列を出力する。音声分
析手段207は、音声入力手段206の出力であるデー
タ列を一定時間間隔あるいは一定個数分ずつ分析処理
し、入力されたデータ列の特徴をいくつかのパラメータ
を含む特徴ベクトル列に変換して出力する。ここでの特
徴ベクトル列は、図1に関する説明の中でのシンボル列
に相当する。本発明は、この分析処理のパラメータの種
類を限定しない。例えば、LPCケプストラム係数や短
区間パワー値等を用いることができるが、その他のパラ
メータも含め詳細は”古井:音響・音声工学、近代科学
社、1992”等にある。続いて、HMMネットワーク
格納手段202は、認識対象(文あるいは単語)を表現
するHMMネットワークを格納する。本発明では、HM
Mネットワーク格納手段202におけるHMMネットワ
ークの格納方法は限定されない。照合手段201は、入
力パターンである前記特徴ベクトル列が、前記HMMの
ネットワークに表現されている個々のパターンとなる確
率を算出する。最適パス算出手段203は、照合手段2
01から得られる確率のうち終端となり得る状態から得
られて、かつその中で最大の確率を求め、その確率を持
つ最適パスを求める。最大尤度状態算出手段204は、
照合手段から得られる全確率の中で最大の確率を求め、
その確率を持つ状態を求める。判定手段205は、最適
パス算出手段203から求められる最適パス上に最大尤
度状態算出手段204から求められる状態が存在してい
るかどうかを判定し、存在する場合は、最適パス算出手
段203から求められる最適パスで表現されるパターン
を認識結果として出力し、存在しない場合は、リジェク
ションを認識結果として出力する。判定手段205にお
いては、図1とその説明に示すリジェクト方法を用いれ
ばよい。
In FIG. 5, the voice input means 206 comprises a microphone and an analog-digital converter, receives a voice waveform of air vibration as an input, and outputs a data string of amplitude values sampled at a constant sampling period. The voice analysis unit 207 analyzes the data sequence output from the voice input unit 206 by a constant time interval or a fixed number of units, and converts the features of the input data sequence into a feature vector sequence containing some parameters. Output. The feature vector sequence here corresponds to the symbol sequence in the description relating to FIG. The present invention does not limit the types of parameters of this analysis process. For example, the LPC cepstrum coefficient, short-term power value, etc. can be used, and details including other parameters can be found in "Furui: Acoustic and Speech Engineering, Modern Science Co., Ltd., 1992". Subsequently, the HMM network storage unit 202 stores the HMM network expressing the recognition target (sentence or word). In the present invention, HM
The method of storing the HMM network in the M network storage unit 202 is not limited. The matching means 201 calculates the probability that the feature vector sequence, which is an input pattern, will be an individual pattern expressed in the HMM network. The optimum path calculation means 203 is the matching means 2
Among the probabilities obtained from 01, the maximum probability obtained from the state that can be the terminal end and among them is obtained, and the optimal path having that probability is obtained. The maximum likelihood state calculation means 204
Find the maximum probability among all the probabilities obtained from the matching means,
Find the state with that probability. The determining unit 205 determines whether or not the state obtained by the maximum likelihood state calculating unit 204 exists on the optimum path obtained by the optimum path calculating unit 203, and if there is, obtains it by the optimum path calculating unit 203. The pattern expressed by the optimum path is output as the recognition result, and if it does not exist, the rejection is output as the recognition result. The judging means 205 may use the reject method shown in FIG. 1 and its description.

【0030】また、判定手段205においては、第3図
に示したように、HMMネットワーク上の終端となりえ
る状態の確率の中で一番確率の高い状態の確率(Pbes
t)とHMMネットワーク上の全状態の中で一番高い確率
とを比較検査し、その差が所定の閾値よりも大きい場合
には「該当無し」を認識結果として出力してもよい。
Further, in the judging means 205, as shown in FIG. 3, the probability of the state having the highest probability (Pbes) among the probabilities of the states which can be the terminal on the HMM network.
t) and the highest probability among all states on the HMM network may be compared and inspected, and if the difference is larger than a predetermined threshold value, “not applicable” may be output as the recognition result.

【0031】[0031]

【発明の効果】本発明によれば、処理量および記憶容量
をほとんど増加させることなくリジェクションを実現で
きる効果が得られる。例えば、音声認識に応用した場合
は、息、咳、咳払い等の短い雑音をリジェクトでき、さ
らに未知語もリジェクトできる。また、無声化音声に対
する音声区間検出処理のミスを検出できる効果が得られ
る。したがって、本発明のリジェクション方法を実装し
た応用システムでは、利用者の誤入力や認識処理の誤認
識に起因するシステムの誤動作を防げる。
According to the present invention, the effect that the rejection can be realized with almost no increase in the processing amount and the storage capacity can be obtained. For example, when applied to voice recognition, short noises such as breath, cough, and throat clearing can be rejected, and unknown words can also be rejected. Further, it is possible to obtain the effect of being able to detect a mistake in the voice section detection process for unvoiced speech. Therefore, in the application system in which the rejection method of the present invention is implemented, it is possible to prevent malfunction of the system due to erroneous input by the user or erroneous recognition of recognition processing.

【0032】なお、本発明によるリジェクションは、H
MMネットワークが大規模になればなるほど任意の入力
パターンを表現でき、効果がでやすくなる。
The rejection according to the present invention is H
The larger the MM network is, the more arbitrary input patterns can be expressed, and the more effective the effect becomes.

【0033】[0033]

【図面の簡単な説明】[Brief description of drawings]

【図1】本発明の処理手順の一例を示すフローチャート
である。
FIG. 1 is a flowchart showing an example of a processing procedure of the present invention.

【図2】本発明におけるHMMネットワークの一例を示
すブロック図である。
FIG. 2 is a block diagram showing an example of an HMM network according to the present invention.

【図3】本発明の処理手順の他の例を示すフローチャー
トである。
FIG. 3 is a flowchart showing another example of the processing procedure of the present invention.

【図4】本発明による時系列パターン認識装置の構成の
一実施例を示すブロック図である。
FIG. 4 is a block diagram showing an embodiment of a configuration of a time series pattern recognition device according to the present invention.

【図5】本発明による音声認識装置の構成の一実施例を
示すブロック図である。
FIG. 5 is a block diagram showing an embodiment of the configuration of a voice recognition device according to the present invention.

【符号の説明】[Explanation of symbols]

101…照合手段、102…HMMネットワーク格納手
段、103…最適パス算出手段、104…最大尤度状態
算出手段、105…判定手段。
101 ... Collation means, 102 ... HMM network storage means, 103 ... Optimal path calculation means, 104 ... Maximum likelihood state calculation means, 105 ... Judgment means.

Claims (6)

【特許請求の範囲】[Claims] 【請求項1】オートマトン等を含む確率モデルに基づく
時系列パターン認識処理において、時系列入力パターン
に対する確率モデル上の各状態の確率計算を終了した時
点で、認識対象を表す確率モデルで終端となりうる状態
のうち最大確率を持つ状態から得られる最適パスと、認
識対象を表す確率モデルで全状態のうち最大確率を持つ
状態を求め、前記全状態のうち最大確率を持つ状態が前
記最適パス上の状態でない場合にリジェクションと判定
することを特徴とするリジェクト方法。
1. A time series pattern recognition process based on a probabilistic model including an automaton, etc., when the probability calculation of each state on the probabilistic model for a time series input pattern is completed, the probabilistic model representing the recognition target can be terminated. The optimal path obtained from the state having the maximum probability among the states, and the state having the maximum probability among all the states is obtained by the stochastic model representing the recognition target, and the state having the maximum probability among all the states is on the optimal path. Rejection method characterized by determining rejection if not in a state.
【請求項2】オートマトン等を含む確率モデルに基づく
時系列パターン認識処理において、時系列入力パターン
に対する確率モデル上の各状態の確率計算を終了した時
点で、認識対象を表す確率モデルで終端となりうる状態
のうち最大確率を持つ状態と、認識対象を表す確率モデ
ルで全状態のうち最大確率を持つ状態を求め、前記2つ
の最大確率を持つ状態の確率値を比較し、確率値の差が
予め定めたしきい値より大きい場合にリジェクションと
判定することを特徴とするリジェクト方法。
2. A time series pattern recognition process based on a probabilistic model including an automaton and the like, at the time when the probability calculation of each state on the probabilistic model for a time series input pattern is completed, the probabilistic model representing the recognition target can be terminated. The state having the maximum probability among the states and the state having the maximum probability among all states are obtained by the probability model representing the recognition target, and the probability values of the states having the two maximum probabilities are compared. A reject method, characterized in that it is judged to be rejected when it is larger than a predetermined threshold value.
【請求項3】HMMに基づく時系列パターン認識処理に
おいて、時系列入力パターンに対するHMMネットワー
ク上の各状態の確率計算を終了した時点で、認識対象を
表すHMMネットワークで終端となりうる状態のうち最
大確率を持つ状態から得られる最適パスと、認識対象を
表すHMMネットワークで全状態のうち最大確率を持つ
状態を求め、前記全状態のうち最大確率を持つ状態が前
記最適パス上の状態でない場合にリジェクションと判定
することを特徴とするリジェクト方法。
3. In the time-series pattern recognition processing based on HMM, at the time when the probability calculation of each state on the HMM network for the time-series input pattern is completed, the maximum probability among the states that can be terminated in the HMM network representing the recognition target. The optimal path obtained from the states having the maximum probability and the state having the maximum probability of all states in the HMM network that represents the recognition target, and rejected if the state having the maximum probability of all states is not on the optimal path. Rejection method, which is characterized by determining that it is an action.
【請求項4】HMMに基づく時系列パターン認識処理に
おいて、時系列入力パターンに対するHMMネットワー
ク上の各状態の確率計算を終了した時点で、認識対象を
表すHMMネットワークで終端となりうる状態のうち最
大確率を持つ状態と、認識対象を表すHMMネットワー
クで全状態のうち最大確率を持つ状態を求め、前記2つ
の最大確率を持つ状態の確率値を比較し、確率値の差が
予め定めたしきい値より大きい場合にリジェクションと
判定することを特徴とするリジェクト方法。
4. In a time series pattern recognition process based on HMM, at the time when the probability calculation of each state on the HMM network with respect to the time series input pattern is completed, the maximum probability among states that can be terminated in the HMM network representing the recognition target. And a state having the maximum probability among all states in the HMM network representing the recognition target, the probability values of the states having the two maximum probabilities are compared, and the difference between the probability values is a predetermined threshold value. Rejection method characterized by determining rejection if larger.
【請求項5】音声を電気信号として取り込む音声入力手
段と、該入力音声の特徴を時系列パターンに変換して出
力する音声分析手段と、認識対象となる音声パターンを
表すHMMネットワークを保持する記憶手段と、該記憶
手段に記憶されているHMMネットワークと前記時系列
パターンとを照合してHMMネットワーク上の各状態の
確率を計算する照合手段と、前記HMMネットワークで
終端となりうる状態のうち最大確率を持つ状態から得ら
れる最適パスを求める最適パス算出手段と、前記HMM
ネットワークで全状態のうち最大確率を持つ状態を求め
る最大尤度算出手段と、上記最大確率を持つ状態が上記
最適パス上の状態であるか否かを判定し、上記最適パス
上にない場合には「該当無し」との認識結果を出力する
判定手段を有することを特徴とする音声認識装置。
5. A voice input means for taking a voice as an electric signal, a voice analysis means for converting the features of the input voice into a time series pattern and outputting the same, and a memory for holding an HMM network representing a voice pattern to be recognized. Means, collating means for collating the HMM network stored in the storage means with the time series pattern to calculate the probability of each state on the HMM network, and the maximum probability of states that can be terminated in the HMM network. Optimal path calculating means for obtaining an optimal path obtained from a state having
Maximum likelihood calculating means for obtaining a state having the maximum probability among all states in the network, and determining whether or not the state having the maximum probability is a state on the optimal path, and when it is not on the optimal path, Is a voice recognition device having a determination means for outputting a recognition result of "not applicable".
【請求項6】音声を電気信号として取り込む音声入力手
段と、該入力音声の特徴を時系列パターンに変換して出
力する音声分析手段と、認識対象となる音声パターンを
表すHMMネットワークを保持する記憶手段と、該記憶
手段に記憶されているHMMネットワークと前記時系列
パターンとを照合してHMMネットワーク上の各状態の
確率を計算する照合手段と、前記HMMネットワークで
終端となりうる状態のうち最大確率を持つ状態から得ら
れる最適パスを求める最適パス算出手段と、前記HMM
ネットワークで全状態のうち最大確率を持つ状態を求め
る最大尤度算出手段と、上記最適パス確率を求めて上記
最大確率と比較し、該比較結果が所定の閾値より大きい
場合には「該当無し」との認識結果を出力する判定手段
を有することを特徴とする音声認識装置。
6. A voice input means for taking a voice as an electric signal, a voice analysis means for converting a feature of the input voice into a time series pattern and outputting the same, and a memory for holding an HMM network representing a voice pattern to be recognized. Means, collating means for collating the HMM network stored in the storage means with the time series pattern to calculate the probability of each state on the HMM network, and the maximum probability of states that can be terminated in the HMM network. Optimal path calculating means for obtaining an optimal path obtained from a state having
Maximum likelihood calculating means for obtaining a state having the maximum probability among all states in the network, and obtaining the optimum path probability and comparing it with the maximum probability. If the comparison result is larger than a predetermined threshold value, "not applicable". A voice recognition device having a determination means for outputting the recognition result of
JP21567495A 1995-08-24 1995-08-24 Reject method in time-series pattern recognition processing and time-series pattern recognition device implementing the same Expired - Fee Related JP3533773B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP21567495A JP3533773B2 (en) 1995-08-24 1995-08-24 Reject method in time-series pattern recognition processing and time-series pattern recognition device implementing the same

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP21567495A JP3533773B2 (en) 1995-08-24 1995-08-24 Reject method in time-series pattern recognition processing and time-series pattern recognition device implementing the same

Publications (2)

Publication Number Publication Date
JPH0962292A true JPH0962292A (en) 1997-03-07
JP3533773B2 JP3533773B2 (en) 2004-05-31

Family

ID=16676292

Family Applications (1)

Application Number Title Priority Date Filing Date
JP21567495A Expired - Fee Related JP3533773B2 (en) 1995-08-24 1995-08-24 Reject method in time-series pattern recognition processing and time-series pattern recognition device implementing the same

Country Status (1)

Country Link
JP (1) JP3533773B2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7260527B2 (en) 2001-12-28 2007-08-21 Kabushiki Kaisha Toshiba Speech recognizing apparatus and speech recognizing method

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7260527B2 (en) 2001-12-28 2007-08-21 Kabushiki Kaisha Toshiba Speech recognizing apparatus and speech recognizing method

Also Published As

Publication number Publication date
JP3533773B2 (en) 2004-05-31

Similar Documents

Publication Publication Date Title
JP3284832B2 (en) Speech recognition dialogue processing method and speech recognition dialogue device
EP3210205B1 (en) Sound sample verification for generating sound detection model
KR101183344B1 (en) Automatic speech recognition learning using user corrections
RU2393549C2 (en) Method and device for voice recognition
JP3826032B2 (en) Speech recognition apparatus, speech recognition method, and speech recognition program
CN100587806C (en) Speech recognition method and apparatus thereof
US6662159B2 (en) Recognizing speech data using a state transition model
JP2010510534A (en) Voice activity detection system and method
JP2001092496A (en) Continuous voice recognition device and recording medium
US6631348B1 (en) Dynamic speech recognition pattern switching for enhanced speech recognition accuracy
CN114155839A (en) Voice endpoint detection method, device, equipment and storage medium
KR101840363B1 (en) Voice recognition apparatus and terminal device for detecting misprononced phoneme, and method for training acoustic model
KR100429896B1 (en) Speech detection apparatus under noise environment and method thereof
US6823304B2 (en) Speech recognition apparatus and method performing speech recognition with feature parameter preceding lead voiced sound as feature parameter of lead consonant
JP3533773B2 (en) Reject method in time-series pattern recognition processing and time-series pattern recognition device implementing the same
KR20120046627A (en) Speaker adaptation method and apparatus
JP2000352993A (en) Voice recognition system and learning method of hidden markov model
KR20230118165A (en) Adapting Automated Speech Recognition Parameters Based on Hotword Attributes
KR100831991B1 (en) Information processing method and information processing device
US6438521B1 (en) Speech recognition method and apparatus and computer-readable memory
JPH11202895A (en) Speech recognition system, method therefor and recording medium recorded with program therefor
US20210398521A1 (en) Method and device for providing voice recognition service
KR20210052563A (en) Method and apparatus for providing context-based voice recognition service
JP4604424B2 (en) Speech recognition apparatus and method, and program
KR100677224B1 (en) Speech recognition method using anti-word model

Legal Events

Date Code Title Description
TRDD Decision of grant or rejection written
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 20040217

A61 First payment of annual fees (during grant procedure)

Free format text: JAPANESE INTERMEDIATE CODE: A61

Effective date: 20040301

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20090319

Year of fee payment: 5

LAPS Cancellation because of no payment of annual fees