JPH11212592A

JPH11212592A - Pattern recognition device and standard pattern generating method

Info

Publication number: JPH11212592A
Application number: JP10012301A
Authority: JP
Inventors: Akio Amano; 明雄天野
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1998-01-26
Filing date: 1998-01-26
Publication date: 1999-08-06

Abstract

PROBLEM TO BE SOLVED: To provide a standard pattern which has a high recognition degree by measuring how much temporarily generated probability distributions by classes overlap with each other, and correcting distributions which overlap with each other and reducing their overlap. SOLUTION: As for the completed probability distributions of the respective classes, how much the classes overlap with each other is measured (302). Denoting the number of the classes to be recognized as N, (N2-N)/2 pairs are present, so overlaps of the distributions are measured as to all the pairs. When a measured value exceeds a previously set threshold value, it is considered that the distributions overlap large and the number of distributions exceeding the threshold value is counted and denoted as K (303). Then it is decided whether or not the count value K is larger than the previously set threshold value Kth (304) and when the K is larger than Kth, the process is finished, but when not, those probability distributions are selected as distributions to be corrected (305) and corrected (306).

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は文字、図形、音声な
どのパタンを識別するパタン認識装置に関する。[0001] 1. Field of the Invention [0002] The present invention relates to a pattern recognition device for identifying patterns of characters, figures, voices and the like.

【０００２】[0002]

【従来の技術】統計的決定理論に従ったパタン認識装置
では、予め各クラスについて特徴ベクトルの確率分布を
求めておき、与えられた入力データに対して前記確率分
布に従って確率値を計算し、最大確率を与えるクラスを
認識結果とする。このような統計的決定理論に従ったパ
タン認識装置は様々な文献、例えばコロナ社刊、長尾真
著、電子通信学会編、電子通信学会大学シリーズ１-４
“パターン情報処理”、106頁-109頁等で解説されてい
る。確率分布としては正規分布がよく用いられる。正規
分布は平均と分散（または標準偏差）を与えれば分布形
状が定まる。各クラス毎にサンプルデータを集め、この
データについて平均と分散を求めることにより確率分布
が定まる。正規分布を確率分布として用いるようなパタ
ン認識装置ではこのように予めサンプルデータを用いて
各クラス毎に正規分布を作成しておき、認識時には入力
データの特徴ベクトルを用いて各クラス毎に確率計算し
最大確率を与えるクラスを認識結果とする。2. Description of the Related Art In a pattern recognition apparatus according to a statistical decision theory, a probability distribution of a feature vector is obtained in advance for each class, a probability value is calculated for given input data according to the probability distribution, and a maximum value is calculated. The class that gives the probability is the recognition result. Such a pattern recognition device according to the statistical decision theory is disclosed in various documents, for example, published by Corona, Makoto Nagao, edited by the Institute of Electronics and Communication Engineers, University of IEICE series 1-4.
This is described in “Pattern Information Processing”, pp. 106-109. A normal distribution is often used as the probability distribution. The normal distribution determines the distribution shape by giving the mean and the variance (or standard deviation). Probability distribution is determined by collecting sample data for each class and calculating the average and variance of the data. In a pattern recognition device that uses a normal distribution as a probability distribution, a normal distribution is created for each class using sample data in advance, and the probability is calculated for each class using the feature vector of the input data during recognition. Then, the class that gives the maximum probability is the recognition result.

【０００３】音声認識装置の従来例としても、予め各ク
ラス毎にサンプルデータを集めておき、各クラスの標準
パタンを特徴ベクトルの確率分布で表現して用いるよう
な認識装置がある。音声認識の場合には処理対象となる
のが一時点の特徴ベクトルではなく特徴ベクトルの時系
列となり、これを扱う手法として隠れマルコフモデルと
呼ばれる手法があり盛んに利用されている。隠れマルコ
フモデルを利用した音声認識に関しては例えば、電子情
報通信学会編、中川聖一著、“確率モデルによる音声認
識”、29頁-89頁等で解説されている。隠れマルコフモ
デルは状態遷移モデルであり、時系列を状態遷移の枠組
みで取り扱う。隠れマルコフモデルを用いる場合、標準
パタンは一時点の特徴ベクトルの確率分布ではなく、状
態遷移モデルの各状態（あるいは状態遷移）に対して確
率分布が対応付けられた状態遷移モデルが標準パタンと
なる。音声認識の場合には認識対象となる入力音声の特
徴ベクトルの時系列を状態遷移モデルの各状態に対応付
ける処理が新たに加わるが基本的な処理の考え方は前記
統計的決定理論に基づくパタン認識装置と同様である。
音声認識においても音声の特徴ベクトルの確率分布を標
準パタンとして用いるのが基本であり、確率分布として
はやはり正規分布を用いることが多い。不特定話者用の
音声認識の場合には話者毎の特徴ベクトルの変動に対応
するため、確率分布を単一の正規分布ではなく、複数の
正規分布の線形和で表現するような混合分布で対応する
場合が多い。As a conventional example of a speech recognition apparatus, there is a recognition apparatus which collects sample data for each class in advance and uses a standard pattern of each class by expressing the probability distribution of a feature vector. In the case of speech recognition, the processing target is not a feature vector at a temporary point but a time series of feature vectors, and a technique called a hidden Markov model is widely used as a technique for handling this. The speech recognition using the hidden Markov model is described in, for example, Seiichi Nakagawa, edited by the Institute of Electronics, Information and Communication Engineers, “Speech Recognition by Stochastic Model”, pp. 29-89. The Hidden Markov Model is a state transition model, and handles time series in the framework of state transition. When the hidden Markov model is used, the standard pattern is not a probability distribution of a feature vector at a temporary point, but a state transition model in which a probability distribution is associated with each state (or state transition) of the state transition model. . In the case of speech recognition, processing for associating the time series of the feature vector of the input speech to be recognized with each state of the state transition model is newly added, but the basic processing concept is a pattern recognition apparatus based on the statistical decision theory. Is the same as
In speech recognition, the probability distribution of a feature vector of speech is basically used as a standard pattern, and a normal distribution is often used as the probability distribution. In the case of speech recognition for unspecified speakers, a mixed distribution that expresses the probability distribution as a linear sum of multiple normal distributions instead of a single normal distribution in order to respond to the variation of the feature vector for each speaker In many cases, this is supported.

【０００４】[0004]

【発明が解決しようとする課題】上記従来技術では、各
クラスの標準パタンは単一の正規分布として用意される
かまたは各クラス毎に予め決められた個数の正規分布の
線形和として用意される。In the above prior art, the standard pattern of each class is prepared as a single normal distribution or as a linear sum of a predetermined number of normal distributions for each class. .

【０００５】パタン認識や音声認識において誤認識の発
生を少なくするためには、各クラスの確率分布の表現精
度や信頼性を向上するだけではなく、対立するクラス間
の分布の重なりを減少することが重要となる。[0005] In order to reduce the occurrence of erroneous recognition in pattern recognition or speech recognition, it is necessary not only to improve the expression accuracy and reliability of the probability distribution of each class, but also to reduce the overlap of the distribution between opposing classes. Is important.

【０００６】本発明の目的は上記従来技術において考慮
が不十分であった「対立するクラス間の分布の重なりの
減少」を基準に確率分布に修正を加え、認識精度の高い
標準パタンを提供することにある。An object of the present invention is to provide a standard pattern with a high recognition accuracy by modifying a probability distribution based on "reduction of overlap between distributions of opposing classes", which was not sufficiently considered in the prior art. It is in.

【０００７】[0007]

【課題を解決するための手段】上記本発明の目的は、一
旦作成された各クラス毎の確率分布間の重なり具合を測
定し、分布に重なりのある分布については分布に修正を
加えて重なりを減少させることにより達成される。SUMMARY OF THE INVENTION The object of the present invention is to measure the degree of overlap between probability distributions once created for each class, and to correct the overlap for distributions with overlapping distributions. Achieved by reducing.

【０００８】[0008]

【発明の実施の形態】以下、図を用いて本発明の実施例
を説明する。本発明は、文字、図形、音声等様々なパタ
ンのパタン認識に適用可能であるが、ここでは音声認識
の場合を例にとって説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of the present invention will be described below with reference to the drawings. The present invention can be applied to pattern recognition of various patterns such as characters, figures, and voices. Here, the case of voice recognition will be described as an example.

【０００９】図１は本発明を適用した単語音声認識装置
の一実施例の構成を示すブロック図である。入力された
音声は音声入力手段１において電気信号に変換される。
電気信号に変換された音声はさらに音声分析手段２にお
いて分析され、特徴ベクトルの時系列が出力される。一
方、標準パタン連結手段７では、標準パタン格納手段５
に予め格納されている認識基本単位の標準パタンを単語
辞書６に格納されている情報にしたがって連結し単語標
準パタンとする。標準パタン連結手段７で作成された標
準パタンと前記入力音声の特徴ベクトル時系列とが照合
手段３にて照合され、認識対象の各単語毎にスコアが求
められる。判定手段４では前記各単語のスコアに基づい
て認識結果を出力する。FIG. 1 is a block diagram showing the configuration of an embodiment of the word speech recognition apparatus to which the present invention is applied. The input voice is converted into an electric signal by the voice input unit 1.
The voice converted into the electric signal is further analyzed by the voice analysis means 2, and a time series of feature vectors is output. On the other hand, the standard pattern connection means 7 includes the standard pattern storage means 5.
Are linked in accordance with the information stored in the word dictionary 6 to form a word standard pattern. The standard pattern created by the standard pattern connection unit 7 and the feature vector time series of the input voice are collated by the collation unit 3, and a score is obtained for each word to be recognized. The determining means 4 outputs a recognition result based on the score of each word.

【００１０】次に本発明の中で用いている認識基本単位
の標準パタンについて説明する。認識基本単位というの
は、単語音声の構成要素である音節やさらに小さい単位
である音韻（母音、子音）等のことを指す。認識基本単
位毎に標準パタンを用意しておけば、これを組み合わせ
ることにより任意の単語音声の標準パタンを構成するこ
とができ、限られた個数の標準パタンで大語彙の音声認
識が可能となる。音声認識の場合、認識基本単位として
は音節、音韻、音韻連鎖等様々な単位が考えられるが本
実施例では簡単のため音節を認識基本単位とした場合の
について説明する。図２は本発明の中で用いている標準
パタンに対応する確率モデル（HiddenMarkov Model、
以下HMMと略す）を示した図である。図中各円は状態を
表わし、矢印は状態間の遷移を表わす。矢印に添えた記
号ａijは状態ｉから状態ｊへの遷移が生じる確率を表わ
し、記号ｂij（ｋ）は状態ｉから状態ｊへの遷移が生じ
たときに第ｋ番目の分類に属する特徴ベクトルが出力さ
れる確率を表わす。入力音声の特徴ベクトル時系列が与
えられると、前記状態遷移確率、出力確率を用いて入力
音声の特徴ベクトル時系列がこの確率モデル（HMM）か
ら出力された確率を計算することができる。前記図１の
中の照合手段３では、この確率計算の処理が行なわれ
る。確率計算処理の詳細に関しては、Kluwer Academic
Publishers, Norwel, MA, 1989 “Automatic Speech Re
cognition”，95頁-97頁に記載されている公知の方法を
用いればよい。Next, the standard pattern of the basic recognition unit used in the present invention will be described. The recognition basic unit refers to a syllable that is a component of a word voice, a phoneme (vowel, consonant) that is a smaller unit, and the like. If a standard pattern is prepared for each basic recognition unit, a standard pattern of an arbitrary word voice can be constructed by combining the standard patterns, and speech recognition of a large vocabulary can be performed with a limited number of standard patterns. . In the case of speech recognition, various units such as a syllable, a phoneme, and a phoneme chain can be considered as a basic unit of recognition. However, in this embodiment, a case where a syllable is used as a basic unit for recognition will be described for simplicity. FIG. 2 shows a stochastic model (HiddenMarkov Model, corresponding to the standard pattern used in the present invention).
FIG. In the figure, each circle represents a state, and arrows represent transitions between the states. The symbol aij attached to the arrow represents the probability that a transition from the state i to the state j occurs. The symbol bij (k) indicates that the feature vector belonging to the k-th class when the transition from the state i to the state j occurs. Indicates the output probability. Given the input speech feature vector time series, the probability that the input speech feature vector time series is output from this probability model (HMM) can be calculated using the state transition probability and the output probability. In the matching means 3 in FIG. 1, the processing of this probability calculation is performed. For details on the probability calculation process, see Kluwer Academic
Publishers, Norwel, MA, 1989 “Automatic Speech Re
cognition ", pages 95 to 97, may be used.

【００１１】次に本発明の音声認識装置において用いる
標準パタンの連結方法について図３を用いて説明する。
図３は単語辞書６にしたがって標準パタンを連結する様
子を説明する図である。前述の様に本発明の音声認識装
置では標準パタンとして状態遷移モデルであるHMMを用
いているので標準パタンの連結が容易に行なわれる。標
準パタンの連結は、先行するモデルの最終状態から出る
状態遷移先を後続するモデルの最初の状態にする様にす
ればよい。図３では、認識の基本単位として日本語の音
節を採用し、辞書中の単語「日立（/hitachi）」を取り
上げている。標準パタン格納手段５には日本語の音節に
対応するHMMが格納されている。単語「日立」のHMMを作
成するには、まず、単語辞書６を調べ単語「日立」が音
節列/ｈｉ/、/ｔａ/、/ｃｈｉ/から構成されていること
を読み出す。標準パタン連結手段７では前記音節列にし
たがって、順次標準パタン格納手段５から/ｈｉ/のHM
M、/ｔａ/のHMM、/ｃｈｉ/のHMMを読み出しこれを連結
した大きな一つのHMMとする。Next, a method of connecting standard patterns used in the speech recognition apparatus of the present invention will be described with reference to FIG.
FIG. 3 is a diagram for explaining how standard patterns are linked according to the word dictionary 6. As described above, in the speech recognition apparatus of the present invention, the HMM which is the state transition model is used as the standard pattern, so that the standard pattern can be easily connected. The connection of the standard patterns may be such that the state transition destination from the final state of the preceding model is set to the first state of the following model. In FIG. 3, Japanese syllables are adopted as the basic unit of recognition, and the word “Hitachi (/ hitachi)” in the dictionary is taken up. The standard pattern storage means 5 stores HMMs corresponding to Japanese syllables. To create an HMM for the word "Hitachi", first, the word dictionary 6 is checked to read out that the word "Hitachi" is composed of the syllable strings / hi /, / ta /, and / chi /. The standard pattern connection means 7 sequentially reads the / hi / HM from the standard pattern storage means 5 according to the syllable string.
M, / ta / HMM, and / chi / HMM are read out and concatenated as one large HMM.

【００１２】次に本発明の音声認識装置において用いる
標準パタンであるHMMの通常の学習方法について説明す
る。HMMは大量の学習用音声サンプルを用いてパラメタ
推定を行なうことにより実施する。図４に示したのはそ
の学習フローの概要を示すフローチャートである。まず
HMMの初期モデルを何らかの方法により作成し（１０
１）、その後学習用音声サンプルを用いたパラメタ再推
定処理（１０２）を収束条件を満たすまで（１０３）繰
り返す。本学習方法は元々繰り返し推定アルゴリズムで
あり、繰り返し回数が増える毎にモデルの精度が向上す
る。したがって、初期モデルは必ずしも精度高く作成す
る必要はない。初期モデルの作成方法については何通り
かの方法があるが、例えば乱数を与えるような手法でよ
い。パラメタ再推定の方法については後述する。収束条
件判断についても何通りかの方法が考えられるが、例え
ば繰り返しの回数を固定して、一定回数（例えば５回）
の繰り返しを行なったら終了する様な方法で実用上問題
ない。Next, a normal learning method of the HMM, which is a standard pattern used in the speech recognition apparatus of the present invention, will be described. HMM is implemented by performing parameter estimation using a large number of training speech samples. FIG. 4 is a flowchart showing an outline of the learning flow. First
Create an initial model of HMM by some method (10
1) Then, the parameter re-estimation process (102) using the learning voice sample is repeated (103) until the convergence condition is satisfied. This learning method is originally an iterative estimation algorithm, and the accuracy of the model improves as the number of iterations increases. Therefore, the initial model does not always need to be created with high accuracy. There are several methods for creating the initial model. For example, a method of giving random numbers may be used. The method of parameter re-estimation will be described later. There are several methods for determining the convergence condition. For example, the number of repetitions is fixed and the number of repetitions is fixed (for example, five).
There is no practical problem in the method of terminating after repeating the above.

【００１３】収束条件が満足されたら繰り返しを終了
し、パラメタ推定により得られた各HMMのパラメタを格
納する（１０４）。When the convergence condition is satisfied, the repetition is terminated, and the parameters of each HMM obtained by parameter estimation are stored (104).

【００１４】次にHMMのパラメタ再推定処理について説
明する。図４のフローチャートに示したようにHMMのパ
ラメタ再推定処理は学習フローの中で繰り返し行なわれ
る。ここではその一回分の処理を図５のフローチャート
を用いて説明する。HMMのパラメタ再推定処理は学習用
の音声サンプルを用いて行なう。学習用の音声サンプル
の個数がNであるとすると、N回類似のパラメタ推定計算
処理を行ない、これが終了した後に各HMMのパラメタを
新しい値に更新する。各音声サンプルを用いたパラメタ
推定処理においては、まず音声サンプルの発声内容に合
わせて認識基本単位のHMMを連結し（２０３）、この連
結したHMMに対してフォワード・バックワード（Forward
-Backward）アルゴリズムと呼ばれる手法を用いてパラ
メタ推定を行なう（２０４）。連結されたHMMを元の認
識基本単位に分解することにより、各認識基本単位のHM
Mのパラメタ推定値が得られる（２０５）。ただし、こ
の時点では各認識基本単位のHMMのパラメタの更新は行
なわず、全音声サンプルについてパラメタ推定値が得ら
れた後にそれまでに得られた全パラメタ推定値を総合し
て各認識基本単位のHMMのパラメタの更新を行なう（２
０７）。なお、パラメタ推定（Forward-Backwardアルゴ
リズム）の具体的な計算手続きについてはKluwer Acade
mic Publishers, Norwel, MA, 1989 “Automatic Speec
h Recognition”，95頁-97頁に記載されている公知の方
法を用いればよい。Next, the parameter re-estimation process of the HMM will be described. As shown in the flowchart of FIG. 4, the parameter re-estimation process of the HMM is repeatedly performed in the learning flow. Here, one process will be described with reference to the flowchart of FIG. The HMM parameter re-estimation process is performed using speech samples for learning. Assuming that the number of speech samples for learning is N, a similar parameter estimation calculation process is performed N times, and after this, the parameters of each HMM are updated to new values. In the parameter estimation process using each voice sample, first, the HMMs of the recognition basic units are connected in accordance with the utterance content of the voice sample (203), and the connected HMM is forward-backwarded (forward).
-Backward) Parameter estimation is performed using a technique called an algorithm (204). By decomposing the concatenated HMM into the original recognition basic units, the HM of each recognition basic unit
A parameter estimate of M is obtained (205). However, at this time, the HMM parameters of each recognition basic unit are not updated, and after the parameter estimation values are obtained for all voice samples, Update HMM parameters (2
07). The specific calculation procedure for parameter estimation (Forward-Backward algorithm) is described in Kluwer Acade
mic Publishers, Norwel, MA, 1989 “Automatic Speec
h Recognition ", pages 95-97, may be used.

【００１５】次に本発明の主眼点である、標準パタンを
構成する確率分布の修正方法について説明する。図６に
示すのは本確率分布修正方法の全体の流れを説明するフ
ローチャートである。Next, a method of correcting a probability distribution constituting a standard pattern, which is the main point of the present invention, will be described. FIG. 6 is a flowchart illustrating the overall flow of the probability distribution correction method.

【００１６】本学習においてはまず従来からある標準的
な手法により確率分布の作成を行なう（３０１）。これ
によりでき上がった各クラスの確率分布について各クラ
ス間の重なり具合を測定する（３０２）。重なり具合の
測定方法については後述する。なお、分布の重なりの測
定は認識対象となるすべてのクラスの対に関して行う。
認識対象のクラスの個数をＮとすると、（Ｎ2−Ｎ）／
２通りの組み合わせの対が存在するので、このすべての
組み合わせについて分布の重なりを測定する。そして、
測定値が予め設定した閾値を超える場合に分布間の重な
りが大きいものとし、この閾値を超える分布の個数を計
数してその値をＫとする（３０３）。次に計数値Ｋが予
め設定した閾値Ｋthより大きいかどうかの判定を行ない
（３０４）、Ｋthより小さければ処理を終了する。Ｋth
より大きい場合には、それらの確率分布を修正すべき確
率分布として選択し（３０５）、それらの分布を修正す
る（３０６）。選択された確率分布修正終了後、再び各
クラス間の確率分布間の重なり具合を測定し（３０
２）、クラス間の重なりの大きな確率分布の個数を計数
し（３０３）、その個数が設定した閾値Ｋthより大きい
かどうかの判定を行なう（３０４）という処理を繰り返
す。クラス間の重なりの大きな確率分布の個数が閾値Ｋ
th以下になれば処理を終了する。In this learning, first, a probability distribution is created by a conventional standard method (301). With respect to the probability distribution of each class thus completed, the degree of overlap between the classes is measured (302). The method of measuring the degree of overlap will be described later. The measurement of the distribution overlap is performed for all pairs of classes to be recognized.
Assuming that the number of classes to be recognized is N, (N2-N) /
Since there are two pairs of combinations, the overlap of the distribution is measured for all the combinations. And
When the measured value exceeds a preset threshold value, it is assumed that the overlap between distributions is large, the number of distributions exceeding the threshold value is counted, and the value is set to K (303). Next, it is determined whether the count value K is greater than a preset threshold value Kth (304), and if it is smaller than Kth, the process is terminated. Kth
If it is larger, those probability distributions are selected as probability distributions to be modified (305), and their distributions are modified (306). After the correction of the selected probability distribution is completed, the degree of overlap between the probability distributions between the classes is measured again (30).
2) The process of counting the number of probability distributions having a large overlap between classes (303) and determining whether the number is greater than a set threshold Kth (304) is repeated. The number of probability distributions with large overlap between classes is the threshold K
When it becomes less than th, the process ends.

【００１７】次に処理ステップ３０２において行う確率
分布の重なり具合の測定方法について説明する。ここで
扱う確率分布は正規分布とする。正規分布は平均と分散
（標準偏差）により分布形状が定まる。したがって正規
分布同士の重なりはそれぞれの平均と分散を用いて計算
することができる。図７に一次元の正規分布についてそ
の重なり具合の計算方法を説明する図を示す。図７には
２つのクラスＡとＢの分布を示した。μA、μBはそれぞ
れクラスＡとＢの平均値、σA、σBはそれぞれクラスＡ
とＢの標準偏差である。このときクラスＡとクラスＢの
分布の重なり具合は（１）式で表わすことができる。Next, a method of measuring the degree of overlap of the probability distributions performed in the processing step 302 will be described. The probability distribution handled here is a normal distribution. The shape of the normal distribution is determined by the mean and the variance (standard deviation). Therefore, the overlap between normal distributions can be calculated using their respective averages and variances. FIG. 7 is a diagram illustrating a method of calculating the degree of overlap of a one-dimensional normal distribution. FIG. 7 shows the distribution of the two classes A and B. μA and μB are the average values of classes A and B, respectively, and σA and σB are the classes A, respectively.
And the standard deviation of B. At this time, the degree of overlap between the distributions of class A and class B can be expressed by equation (1).

【００１８】[0018]

【数１】 (Equation 1)

【００１９】すなわち２つの分布の間の平均値が近いほ
ど、また、それぞれの分散（標準偏差）が大きい程重な
り具合が大きいことになる。図７および（１）式に示し
たのは一次元の場合の説明であるが、実際の音声認識で
は多次元の確率分布を扱うことになる。多次元の確率分
布の場合、分散は次元間の相関を考慮した共分散行列と
して扱うのが基本であるが、分布のパラメタ削減のため
に共分散行列の対角成分以外は零として取り扱うことが
多い。本実施例でも対角成分のみを持つ共分散行列を考
えることにする。μAi、μBiをそれぞれクラスＡとＢの
平均値ベクトルの第ｉ次元の要素、σAi、σBiをそれぞ
れクラスＡとＢの共分散行列の第ｉ行第ｉ列成分とする
と、クラスＡとクラスＢの分布の重なり具合は（２）式
で表わすことができる。That is, the closer the average value between the two distributions is, and the larger the variance (standard deviation) is, the greater the degree of overlap is. FIG. 7 and equation (1) show a one-dimensional case, but actual speech recognition deals with a multi-dimensional probability distribution. In the case of a multidimensional probability distribution, the variance is basically treated as a covariance matrix considering the correlation between dimensions, but in order to reduce the parameters of the distribution, it is possible to treat the components other than the diagonal components of the covariance matrix as zero. Many. Also in this embodiment, a covariance matrix having only diagonal components will be considered. Let μAi and μBi be the ith element of the mean value vector of classes A and B, and σAi and σBi be the ith row and ith column components of the covariance matrices of classes A and B, respectively. The degree of overlap of the distribution can be expressed by equation (2).

【００２０】[0020]

【数２】 (Equation 2)

【００２１】次に重なりの大きい２つの確率分布が与え
られたとき、いずれの分布を修正するかの選択するかと
いうステップ３０５の処理であるが、本実施例では
（３）式で表わされる各次元の分散の総和が大きい法の
分布を修正対象として選択するという方法をとる。When two probability distributions with the next largest overlap are given, the process of step 305 is to select which distribution is to be modified. In this embodiment, each of the distributions represented by the equation (3) is selected. A method is adopted in which the distribution of the modulus having a large sum of the variances of the dimensions is selected as a correction target.

【００２２】[0022]

【数３】 (Equation 3)

【００２３】次にステップ３０６の確率分布の修正処理
について説明する。本実施例においては確率分布は正規
分布で与えられている。確率分布の修正は修正前の単一
の正規分布を２つの正規分布の線形和で表現し直すこと
により行う。図８に確率分布の修正処理を説明する図を
示す。図８では４０１と４０２が重なりのある分布であ
る。４０３は分布４０１の中心、４０４は分布４０２の
中心である。図８の場合では分布４０１のほうが分散が
大きいので分布４０１の修正を行う。図８中ｘ印で示し
ているのは分布４０１を作成するのに用いたサンプルデ
ータである。修正は以下の手順で行う。まず分布４０２
の中心４０４から最も遠いサンプルデータを求める。図
８では４０５がこれに相当する。次に分布４０２の中心
４０４から遠く、かつ、サンプルデータ４０５とは分布
４０１の中心点４０３とは反対側に位置するサンプルデ
ータを求める。図８では４０６がこれに相当する。次に
分布４０１を作成するのに用いたサンプルデータの全て
について、４０５と４０６の何れの点に近いかを判別し
て全サンプルデータを２つに分類する。次に分類された
サンプルデータを用いて平均値と分散を計算し、それぞ
れ正規分布を作成する。図８では４０７と４０８が新た
に作成された正規分布である。以上確率分布の修正のフ
ローを図９に示す。Next, the process of correcting the probability distribution in step 306 will be described. In the present embodiment, the probability distribution is given by a normal distribution. Correction of the probability distribution is performed by expressing a single normal distribution before correction by a linear sum of two normal distributions. FIG. 8 is a view for explaining the probability distribution correction processing. In FIG. 8, 401 and 402 are overlapping distributions. 403 is the center of the distribution 401, and 404 is the center of the distribution 402. In the case of FIG. 8, the distribution 401 is modified because the distribution 401 is larger. The sample data used to create the distribution 401 is indicated by an x in FIG. The correction is performed in the following procedure. First, the distribution 402
Of the sample data farthest from the center 404 of. In FIG. 8, 405 corresponds to this. Next, sample data far from the center 404 of the distribution 402 and located on the opposite side of the sample data 405 from the center point 403 of the distribution 401 is obtained. In FIG. 8, 406 corresponds to this. Next, for all of the sample data used to create the distribution 401, it is determined which of 405 and 406 is closer to the point, and all the sample data is classified into two. Next, an average value and a variance are calculated using the classified sample data, and a normal distribution is created for each. In FIG. 8, 407 and 408 are newly created normal distributions. FIG. 9 shows a flow of correcting the probability distribution.

【００２４】次に本発明の主眼点である、標準パタンを
構成する確率分布の修正方法の別の実現例について説明
する。図６〜９で説明したのは、標準パタンを構成する
確率分布間の重なりを削減するようにした手法であっ
た。ここで説明するのは実際に認識を行い、誤認識の原
因となるような分布の重なりを抽出し、この部分の分布
の重なりを対象に分布の重なりの低減をはかるものであ
る。図１０に示すのは本確率分布修正方法の別の実現例
の全体の流れを説明するフローチャートである。Next, another embodiment of the method of correcting the probability distribution forming the standard pattern, which is the main point of the present invention, will be described. The method described with reference to FIGS. 6 to 9 is a method for reducing the overlap between the probability distributions constituting the standard pattern. What is explained here is to actually perform recognition, extract an overlap of distributions that may cause erroneous recognition, and reduce the overlap of distributions with respect to the overlap of distributions in this portion. FIG. 10 is a flowchart illustrating the overall flow of another example of realizing the probability distribution correction method.

【００２５】まず従来からある標準的な手法により確率
分布の作成を行なう（６０１）。これによりでき上がっ
た各クラスの確率分布を用いて評価用サンプルデータの
認識を行う（６０２）。次に誤認識を起こしたデータに
ついて、入力データのクラスの確率分布と誤認識結果の
クラスの確率分布の間に分布の重なりがあったものとし
て、分布の重なりの個数を計数する（６０３）。次に計
数値Ｋが予め設定した閾値Ｋthより大きいかどうかの判
定を行ない（６０４）、Ｋthより小さければ処理を終了
する。Ｋthより大きい場合には、それらの確率分布を修
正すべき確率分布として選択し（６０５）、それらの分
布を修正する（６０６）。選択された確率分布修正終了
後、再び評価用サンプルデータの認識を行い（６０
２）、誤認識を起こしたデータについて、入力データの
クラスの確率分布と誤認識結果のクラスの確率分布の間
に分布の重なりがあったものとして、分布の重なりの個
数を計数し（６０３）、その個数が設定した閾値Ｋthよ
り大きいかどうかの判定を行なう（６０４）という処理
を繰り返す。重なりのある確率分布の個数が閾値Ｋth以
下になれば処理を終了する。処理ステップ６０４、６０
５、６０６の処理は前述の図８における処理ステップ３
０４、３０５、３０６の処理とほぼ同様であるが、やや
異なる。入力データのクラスの確率分布と認識結果のク
ラスの確率分布の間に重なりがあるわけであるが、誤認
識を避けるためには認識結果のクラスの確率値を下げる
必要があるとの考え方から処理ステップ６０５では認識
結果のクラスの確率分布を修正対象として選択する。処
理ステップ６０６の確率分布の修正処理も処理ステップ
３０６とほぼ同様であるが、処理ステップ３０６では図
９のフローチャートに示したように対立クラスの分布の
中心点ｐの情報を利用したが、処理ステップ６０６では
対立クラスの分布の中心点ｐの代わりに誤認識した入力
データサンプル集合の中心点を用いる。First, a probability distribution is created by a conventional standard method (601). Recognition of the sample data for evaluation is performed using the probability distribution of each class thus completed (602). Next, regarding the data in which misrecognition has occurred, the number of distribution overlaps is counted assuming that there is an overlap between the probability distribution of the class of the input data and the probability distribution of the class of the misrecognition result (603). Next, it is determined whether or not the count value K is greater than a preset threshold value Kth (604), and if it is smaller than Kth, the process is terminated. If it is larger than Kth, those probability distributions are selected as probability distributions to be corrected (605), and those distributions are corrected (606). After the correction of the selected probability distribution is completed, the sample data for evaluation is recognized again (60).
2) For the data in which the misrecognition has occurred, the number of distribution overlaps is counted assuming that there is a distribution overlap between the probability distribution of the class of the input data and the probability distribution of the class of the misrecognition result (603). The process of determining whether the number is greater than the set threshold value Kth (604) is repeated. When the number of overlapping probability distributions becomes equal to or smaller than the threshold value Kth, the process ends. Processing steps 604, 60
The processing in steps 5 and 606 is the same as the processing step 3 in FIG.
04, 305, and 306, but slightly different. Although there is an overlap between the probability distribution of the class of the input data and the probability distribution of the class of the recognition result, processing is performed based on the idea that it is necessary to lower the probability value of the class of the recognition result to avoid erroneous recognition. In step 605, the probability distribution of the class of the recognition result is selected as a correction target. The processing of correcting the probability distribution in the processing step 606 is almost the same as the processing step 306. In the processing step 306, information on the center point p of the distribution of the opposing class is used as shown in the flowchart of FIG. In step 606, the center point of the erroneously recognized input data sample set is used instead of the center point p of the distribution of the opposing class.

【００２６】[0026]

【発明の効果】以上本発明によれば、標準パタンの構成
要素である確率分布の認識対象クラス間での重なりを減
少することができるので高精度なパタン認識、音声認識
が可能となる。As described above, according to the present invention, it is possible to reduce the overlap between the recognition target classes of the probability distribution, which is a component of the standard pattern, so that highly accurate pattern recognition and speech recognition become possible.

[Brief description of the drawings]

【図１】本発明を適用した単語音声認識装置の一実施例
の構成を示すブロック図。FIG. 1 is a block diagram showing a configuration of an embodiment of a word speech recognition apparatus to which the present invention is applied.

【図２】本発明を適用した単語音声認識装置で用いる認
識基本単位の隠れマルコフモデルを説明する図。FIG. 2 is a diagram illustrating a hidden Markov model of a basic recognition unit used in the word speech recognition apparatus to which the present invention has been applied.

【図３】本発明を適用した単語音声認識装置で用いる認
識基本単位の隠れマルコフモデルを単語辞書にしたがっ
て連結する様子を説明する図。FIG. 3 is a view for explaining a state in which a hidden Markov model of a basic recognition unit used in the word speech recognition apparatus to which the present invention is applied is connected in accordance with a word dictionary.

【図４】本発明の標準パタンの学習方法を説明するフロ
ーチャート。FIG. 4 is a flowchart illustrating a method for learning a standard pattern according to the present invention.

【図５】本発明の標準パタンの学習方法におけるパラメ
タ推定処理を説明するフローチャート。FIG. 5 is a flowchart for explaining parameter estimation processing in the standard pattern learning method of the present invention.

【図６】本発明の確率分布の修正方法の全体フローを説
明するフローチャート。FIG. 6 is a flowchart illustrating an overall flow of a probability distribution correction method according to the present invention.

【図７】本発明を適用した単語音声認識装置で用いる認
識基本単位の確率分布の重なり具合の測定方法を説明す
る図。FIG. 7 is a view for explaining a method of measuring the degree of overlap of the probability distributions of the basic recognition units used in the word speech recognition apparatus to which the present invention is applied.

【図８】本発明の確率分布の修正方法を説明する図。FIG. 8 is a diagram illustrating a method for correcting a probability distribution according to the present invention.

【図９】本発明の確率分布の修正手順を説明するフロー
チャート。FIG. 9 is a flowchart illustrating a probability distribution correction procedure according to the present invention.

【図１０】本発明の別の確率分布の修正方法の全体フロ
ーを説明するフローチャート。FIG. 10 is a flowchart illustrating an overall flow of another probability distribution correction method according to the present invention.

[Explanation of symbols]

１・・・音声入力手段、２・・・音声分析手段、３・・
・照合手段、４・・・判定手段５・・・標準パタン格納
手段、６・・・単語辞書、７・・・標準パタン連結手
段、１０１・・・初期モデル作成処理、１０２・・・パ
ラメタ再推定処理、２０４・・・Forward-Backwardアル
ゴリズム、３０２・・・各クラス間の分布の重なり測定
処理、３０６・・・確率分布の修正処理、４０１・・・
修正対象の確率分布、４０２・・・対立クラスの確率分
布、４０３・・・修正対象の確率分布の中心点、４０４
・・・対立クラスの確率分布の中心点、４０５・・・対
立クラスの確率分布の中心点から最も遠いサンプルデー
タ、４０６・・・対立クラスの確率分布の中心点から遠
く、対立クラスの確率分布の中心点から最も遠いサンプ
ルデータ４０５とは修正対象の確率分布の中心点４０３
について反対側にあるサンプルデータ、４０７、４０８
・・・修正によりできた新たな確率分布。1 ... voice input means, 2 ... voice analysis means, 3 ...
-Matching means, 4-Judgment means 5-Standard pattern storage means, 6-Word dictionary, 7-Standard pattern concatenation means, 101-Initial model creation processing, 102-Parameter rewriting Estimation process, 204: Forward-Backward algorithm, 302: Overlap measurement process of distribution between each class, 306: Correction process of probability distribution, 401 ...
Probability distribution to be corrected, 402: Probability distribution of opposition class, 403: Center point of probability distribution to be corrected, 404
················································································ ············································································································································ Is the sample data 405 farthest from the center point of the probability distribution center point 403 of the probability distribution to be corrected.
407, 408 on the other side for
... New probability distribution created by correction.

Claims

[Claims]

A feature extraction means for analyzing input data of a recognition target to extract a feature vector; a standard pattern storage means for storing a probability distribution of a feature vector for each class of the recognition target as a standard pattern; Matching means for calculating a probability using a probability distribution which is a standard pattern for each class with respect to the feature vector of the input data, and performing recognition based on a probability value for each class output from the matching means. In the pattern recognition device that performs the above, the probability distribution of the feature vector that is the standard pattern of each class is corrected according to the degree of overlap between the classes in the probability distribution of the feature vector that is the standard pattern of each class. A pattern recognition device characterized in that:

2. The pattern recognition according to claim 1, wherein the correction of the probability distribution of the feature vector, which is a standard pattern for each class, is performed using a linear sum of a plurality of probability distributions. apparatus.

3. The method according to claim 1, wherein the probability distribution of the feature vector, which is a standard pattern of each class, is corrected by using input data of each class used to create the probability distribution of a standard pattern of each class. 3. The pattern recognition device according to claim 1, wherein the pattern recognition device corrects the pattern.

4. The modification of the probability distribution of the feature vector, which is a standard pattern for each class, differs from the input data for each class used to create the probability distribution, which is a standard pattern for each class. 3. The pattern recognition apparatus according to claim 1, wherein the pattern is corrected using input data.

5. The method according to claim 1, wherein the probability distribution of the feature vector, which is a standard pattern for each class, is corrected only when the result of pattern recognition is erroneous recognition and only for the probability distribution that caused the recognition. Claim 1 or Claim 2 or Claim 3 or Claim 4 wherein the processing is performed.
The described pattern recognition device.

6. A feature extracting means for analyzing an input voice at regular time intervals to extract a feature vector time series, and a standard for storing a time series of a probability distribution of a feature vector for each class to be recognized as a standard pattern. Pattern storing means, and matching means for calculating a cumulative probability using a probability distribution time series which is a standard pattern for each class with respect to the feature vector of the input voice, and each class output from the matching means. In the speech recognition apparatus that performs recognition based on the value of the cumulative probability for each class, the standard for each class is determined according to the degree of overlap between the classes in the probability distribution of the feature vector that is a component of the standard pattern for each class. A speech recognition apparatus, wherein a probability distribution of a feature vector which is a component of a pattern is corrected.

7. The method according to claim 6, wherein the probability distribution of the feature vector, which is a component of the standard pattern for each class, is modified using a linear sum of a plurality of probability distributions. Voice recognition device.

8. The method according to claim 1, wherein the correction of the probability distribution of the feature vector, which is a component of the standard pattern for each class, is performed for each class used to generate the probability distribution of the component of the standard pattern for each class. 8. The speech recognition device according to claim 6, wherein the correction is performed using the input speech.

9. The method of correcting the probability distribution of a feature vector, which is a component of a standard pattern of each class, comprises the steps of: modifying a probability distribution of a component of a standard pattern of each class; 8. The speech recognition apparatus according to claim 6, wherein the correction is performed by using an input speech different from the input speech of the speech recognition device.

10. The correction of the probability distribution of the feature vector, which is a component of the standard pattern for each class, is performed only when the result of speech recognition is incorrect recognition. 10. The speech recognition device according to claim 6, wherein the speech recognition is performed only on the speech recognition device.